Line search methods

The defining characteristic of a line search algorithm is to choose a search direction p_k and then to find the minimum along that vector starting from θ_k (Nocedal and Wright, 1999). The distance travelled along p_k is the step length α_k and the parameter values for the next iteration are

θ_k+1 = θ_k + α_kp_k.

(14.7)

The line search algorithm determines the search direction p_k whereas the value of α_k is found using an auxiliary step-length selection algorithm.

The steepest descent algorithm

One of the simplest line search methods is the steepest descent algorithm. The search direction is simply the negative gradient, p_k = - ∇f_k, and hence the direction of maximal descent is always followed. This method is inefficient - the linear rate of convergence requires many iterations of the algorithm to reach the minimum and it is susceptible to being trapped on saddle points within the space.

The coordinate descent algorithm

The coordinate descent algorithms are a simplistic group of line search methods whereby the search directions alternate between vectors parallel to the parameter axes. For the back-and-forth coordinate descent the search directions cycle in one direction and then back again. For example for a three parameter model the search directions cycle θ₁, θ₂, θ₃, θ₂, θ₁, θ₂,, which means that each parameter of the model is optimised one by one. The method becomes less efficient when approaching the minimum as the step length α_k continually decreases (ibid.).

The BFGS algorithm

The quasi-Newton methods begin with an initial guess of the Hessian and update it at each iteration using the function value and gradient. Therefore the benefits of using the quadratic model of (14.5) are obtained without calculating the computationally expensive Hessian. The Hessian approximation B_k is updated using various formulae, the most common being the BFGS formula (Broyden, 1970; Shanno, 1970; Fletcher, 1970; Goldfarb, 1970). The search direction is given by the equation p_k = - B_k^-1∇f_k. The quasi-Newton algorithms can attain a superlinear rate of convergence, being superior to the steepest descent or coordinate descent methods.

The Newton algorithm

The most powerful line search method when close to the minimum is the Newton search direction

p_k = - ∇²f_k^-1∇f_k.

(14.8)

This direction is obtained from the derivative of (14.5) which is assumed to be zero at the minimum of the quadratic model. The vector p_k points from the current position to the exact minimum of the quadratic model of the space. The rate of convergence is quadratic, being superior to both linear and superlinear convergence. The technique is computationally expensive due to the calculation of the Hessian. It is also susceptible to failure when optimisation commences from distant positions in the space as the Hessian may not be positive definite and hence not convex, a condition required for the search direction both to point downhill and to be reasonably oriented. In these cases the quadratic model is a poor description of the space. This algorithm is also known as the Newton-Raphson method.

The Newton conjugate gradient algorithm

A practical Newton algorithm which is robust for distant starting points is the Newton conjugate gradient method (Newton-CG). This line search method, which is also called the truncated Newton algorithm, finds an approximate solution to Equation (14.8) by using a conjugate gradient (CG) sub-algorithm. Retaining the performance of the pure Newton algorithm, the CG sub-algorithm guarantees that the search direction is always downhill as the method terminates when negative curvature is encountered.

The auxiliary step-length selection algorithm

Once the search direction has been determined by the above algorithms the minimum along that direction needs to be determined. Not to be confused with the methodology for determining the search direction p_k, the line search itself is performed by an auxiliary step-length selection algorithm to find the value α_k. A number of step-length selection methods can be used to find a minimum along the line θ_k + α_kp_k. One is the backtracking line search of Nocedal and Wright (1999). This method is inexact - it takes a starting step length α_k and decreases the value until a sufficient decrease in the function is found. Another is the line search method of Moré and Thuente (1994). Designed to be robust, the MT algorithm finds the exact minimum along the search direction and guarantees sufficient decrease.

The relax user manual (PDF), created 2024-06-08.