Method of Multipliers algorithm

One of the most powerful approaches is the Method of Multipliers (Nocedal and Wright, 1999), also known as the Augmented Lagrangian. Instead of a single optimisation the algorithm is iterative with each iteration consisting of an independent unconstrained minimisation on a sequentially modified space. When inside the limits the function value is unchanged but when outside a penalty, which is proportional to the distance outside the limit, is added to the function value. This penalty, which is based on the Lagrange multipliers, is smooth and hence the gradient and Hessian are continuous at and beyond the constraints. For each iteration of the Method of Multipliers the penalty is increased until it becomes impossible for the parameter vector to be in violation of the limits. This approach allows the parameter vector θ outside the limits yet the successive iterations ensure that the final results will not be in violation of the constraint.

For inequality constraints, each iteration of the Method of Multipliers attempts to solve the quadratic sub-problem

$\displaystyle \min_{\theta}^{}$$\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) \stackrel{\mathrm{def}}{=} f(\theta) + \sum_{i \in \mathfrak{I}} \Psi(c_i(\theta), \lambda_i^k; \mu_k),$ (14.11)

where the function Ψ is defined as

Ψ(ci(θ), λk;μk) = \begin{displaymath}\begin{cases}-\lambda^k c_i(\theta) + \frac{1}{2\mu_k} c_i^...
...\frac{\mu_k}{2} (\lambda^k)^2 & \textrm{otherwise}.
\end{cases}\end{displaymath} (14.12)

In (14.11), θ is the parameter vector; $\mathfrak{L}_A$ is the Augmented Lagrangian function; k is the current iteration of the Method of Multipliers; λk are the Lagrange multipliers which are positive factors such that, at the minimum $\hat{\theta}$, f ($\hat{\theta}$) = λici($\hat{\theta}$); μk > 0 is the penalty parameter which decreases to zero as k→∞; $\mathfrak{I}$ is the set of inequality constraints; and ci(θ) is an individual constraint value. The Lagrange multipliers are updated using the formula

λik+1 = max(λik - ci(θ)/μk, 0),        foralli$\displaystyle \mathfrak{I}.$ (14.13)

The gradient of the Augmented Lagrangian is

$\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) =
\nabla f(\theta)
- \su...
...i^k}
\left( \lambda_i^k - \frac{c_i(\theta)}{\mu_k} \right) \nabla c_i(\theta),$ (14.14)

and the Hessian is

2$\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) =
\nabla^2 f(\theta)
+ \...
...( \lambda_i^k - \frac{c_i(\theta)}{\mu_k} \right) \nabla^2 c_i(\theta)
\right].$ (14.15)

The Augmented Lagrangian algorithm can accept any set of three arbitrary constraint functions c(θ), c(θ), and 2c(θ). When given the current parameter values c(θ) returns a vector of constraint values whereby each position corresponds to one of the model parameters. The constraint is defined as ci $\geqslant$ 0. The function c(θ) returns the matrix of constraint gradients and 2c(θ) is the constraint Hessian function which should return the 3D matrix of constraint Hessians.

A more specific set of constraints accepted by the Method of Multipliers are bound constraints. These are defined by the function

l $\displaystyle \leqslant$ θ $\displaystyle \leqslant$ u, (14.16)

where l and u are the vectors of lower and upper bounds respectively and θ is the parameter vector. For example for model-free model m4 to place lower and upper bounds on the order parameter and lower bounds on the correlation time and chemical exchange parameters, the vectors are

$\displaystyle \begin{pmatrix}
0 \\
0 \\
0 \\
\end{pmatrix}$ $\displaystyle \leqslant$ $\displaystyle \begin{pmatrix}
S^2 \\
\tau_e \\
R_{ex} \\
\end{pmatrix}$ $\displaystyle \leqslant$ $\displaystyle \begin{pmatrix}
1 \\
\infty \\
\infty \\
\end{pmatrix}$. (14.17)

The default setting in the program relax is to use linear constraints which are defined as

Aθ $\displaystyle \geqslant$ b, (14.18)

where A is an m×n matrix where the rows are the transposed vectors ai of length n; the elements of ai are the coefficients of the model parameters; θ is the vector of model parameters of dimension n; b is the vector of scalars of dimension m; m is the number of constraints; and n is the number of model parameters.

In rearranging (14.18) the linear constraint function c(θ) returns the vector Aθ - b. Because of the linearity of the constraints the gradient and Hessian are greatly simplified. The gradient c(θ) is simply the matrix A and the Hessian 2c(θ) is zero.

The relax user manual (PDF), created 2020-08-26.