Hessian modifications

The Newton search direction, used in both the line search and trust region methods, is dependent on the Hessian being positive definite for the quadratic model to be convex so that the search direction points sufficiently downhill. This is not always the case as saddle points and other non-quadratic features of the space can be problematic. Two classes of algorithms can be used to handle this situation. The first involves using the conjugate gradient method as a sub-algorithm for solving the Newton problem for the step k. The Newton-CG line search algorithm described above is one such example. The second class involves modifying the Hessian prior to, or at the same time as, finding the Newton step to guarantee that the replacement matrix B_k is positive definite. The convexity of B_k is ensured by its eigenvalues all being positive.

The first modification uses the Cholesky factorisation of the matrix B_k, initialised to the true Hessian, to test for convexity (Algorithm 6.3 of Nocedal and Wright (1999)). If factorisation fails the matrix is not positive definite and a constant τ_k times the identity matrix I is then added to B_k. The constant originates from the Robbins norm of the Hessian $\lVert$ ∇²f_k $\rVert_{F}^{}$ and is steadily increased until the factorisation is successful. The resultant Cholesky lower triangular matrix L can then be used to find the approximate Newton direction. If τ_k is too large the convergence of this technique can approach that of the steepest descent algorithm.

The second method is the Gill, Murray, and Wright (GMW) algorithm (Gill et al., 1981) which modifies the Hessian during the execution of the Cholesky factorisation ∇²f_k = LIL^T, where L is a lower triangular matrix and D is a diagonal matrix. Only a single factorisation is required. As rows and columns are interchanged during the algorithm the technique may be slow for large problems such as the optimisation of the model-free parameters of all spins together with the diffusion tensor parameters. The rate of convergence of the technique is quadratic.

The relax user manual (PDF), created 2024-06-08.