Line search methods

The defining characteristic of a line search algorithm is to choose a search direction pk and then to find the minimum along that vector starting from θk (Nocedal and Wright, 1999). The distance travelled along pk is the step length αk and the parameter values for the next iteration are

θk+1 = θk + αkpk. (14.7)

The line search algorithm determines the search direction pk whereas the value of αk is found using an auxiliary step-length selection algorithm.

The steepest descent algorithm

One of the simplest line search methods is the steepest descent algorithm. The search direction is simply the negative gradient, pk = - ∇fk, and hence the direction of maximal descent is always followed. This method is inefficient - the linear rate of convergence requires many iterations of the algorithm to reach the minimum and it is susceptible to being trapped on saddle points within the space.

The coordinate descent algorithm

The coordinate descent algorithms are a simplistic group of line search methods whereby the search directions alternate between vectors parallel to the parameter axes. For the back-and-forth coordinate descent the search directions cycle in one direction and then back again. For example for a three parameter model the search directions cycle θ1, θ2, θ3, θ2, θ1, θ2,, which means that each parameter of the model is optimised one by one. The method becomes less efficient when approaching the minimum as the step length αk continually decreases (ibid.).

The BFGS algorithm

The quasi-Newton methods begin with an initial guess of the Hessian and update it at each iteration using the function value and gradient. Therefore the benefits of using the quadratic model of (14.5) are obtained without calculating the computationally expensive Hessian. The Hessian approximation Bk is updated using various formulae, the most common being the BFGS formula (Fletcher, 1970; Broyden, 1970; Shanno, 1970; Goldfarb, 1970). The search direction is given by the equation pk = - Bk-1fk. The quasi-Newton algorithms can attain a superlinear rate of convergence, being superior to the steepest descent or coordinate descent methods.

The Newton algorithm

The most powerful line search method when close to the minimum is the Newton search direction

pk = - ∇2fk-1fk. (14.8)

This direction is obtained from the derivative of (14.5) which is assumed to be zero at the minimum of the quadratic model. The vector pk points from the current position to the exact minimum of the quadratic model of the space. The rate of convergence is quadratic, being superior to both linear and superlinear convergence. The technique is computationally expensive due to the calculation of the Hessian. It is also susceptible to failure when optimisation commences from distant positions in the space as the Hessian may not be positive definite and hence not convex, a condition required for the search direction both to point downhill and to be reasonably oriented. In these cases the quadratic model is a poor description of the space. This algorithm is also known as the Newton-Raphson method.

The Newton conjugate gradient algorithm

A practical Newton algorithm which is robust for distant starting points is the Newton conjugate gradient method (Newton-CG). This line search method, which is also called the truncated Newton algorithm, finds an approximate solution to Equation (14.8) by using a conjugate gradient (CG) sub-algorithm. Retaining the performance of the pure Newton algorithm, the CG sub-algorithm guarantees that the search direction is always downhill as the method terminates when negative curvature is encountered.

The auxiliary step-length selection algorithm

Once the search direction has been determined by the above algorithms the minimum along that direction needs to be determined. Not to be confused with the methodology for determining the search direction pk, the line search itself is performed by an auxiliary step-length selection algorithm to find the value αk. A number of step-length selection methods can be used to find a minimum along the line θk + αkpk. One is the backtracking line search of Nocedal and Wright (1999). This method is inexact - it takes a starting step length αk and decreases the value until a sufficient decrease in the function is found. Another is the line search method of Moré and Thuente (1994). Designed to be robust, the MT algorithm finds the exact minimum along the search direction and guarantees sufficient decrease.

The relax user manual (PDF), created 2016-10-28.