Subsections

Line search methods

The defining characteristic of a line search algorithm is to choose a search direction pk and then to find the minimum along that vector starting from θk (Nocedal and Wright, 1999). The distance travelled along pk is the step length αk and the parameter values for the next iteration are

 θk+1 = θk + αkpk. (14.7)

The line search algorithm determines the search direction pk whereas the value of αk is found using an auxiliary step-length selection algorithm.

The steepest descent algorithm

One of the simplest line search methods is the steepest descent algorithm. The search direction is simply the negative gradient, pk = - ∇fk, and hence the direction of maximal descent is always followed. This method is inefficient - the linear rate of convergence requires many iterations of the algorithm to reach the minimum and it is susceptible to being trapped on saddle points within the space.

The coordinate descent algorithm

The coordinate descent algorithms are a simplistic group of line search methods whereby the search directions alternate between vectors parallel to the parameter axes. For the back-and-forth coordinate descent the search directions cycle in one direction and then back again. For example for a three parameter model the search directions cycle θ1, θ2, θ3, θ2, θ1, θ2,, which means that each parameter of the model is optimised one by one. The method becomes less efficient when approaching the minimum as the step length αk continually decreases (ibid.).

The BFGS algorithm

The quasi-Newton methods begin with an initial guess of the Hessian and update it at each iteration using the function value and gradient. Therefore the benefits of using the quadratic model of (14.5) are obtained without calculating the computationally expensive Hessian. The Hessian approximation Bk is updated using various formulae, the most common being the BFGS formula (Fletcher, 1970; Shanno, 1970; Broyden, 1970; Goldfarb, 1970). The search direction is given by the equation pk = - Bk-1fk. The quasi-Newton algorithms can attain a superlinear rate of convergence, being superior to the steepest descent or coordinate descent methods.

The Newton algorithm

The most powerful line search method when close to the minimum is the Newton search direction

 pk = - ∇2fk-1∇fk. (14.8)

This direction is obtained from the derivative of (14.5) which is assumed to be zero at the minimum of the quadratic model. The vector pk points from the current position to the exact minimum of the quadratic model of the space. The rate of convergence is quadratic, being superior to both linear and superlinear convergence. The technique is computationally expensive due to the calculation of the Hessian. It is also susceptible to failure when optimisation commences from distant positions in the space as the Hessian may not be positive definite and hence not convex, a condition required for the search direction both to point downhill and to be reasonably oriented. In these cases the quadratic model is a poor description of the space. This algorithm is also known as the Newton-Raphson method.