Method of Multipliers algorithm

One of the most powerful approaches is the Method of Multipliers (Nocedal and Wright, 1999), also known as the Augmented Lagrangian. Instead of a single optimisation the algorithm is iterative with each iteration consisting of an independent unconstrained minimisation on a sequentially modified space. When inside the limits the function value is unchanged but when outside a penalty, which is proportional to the distance outside the limit, is added to the function value. This penalty, which is based on the Lagrange multipliers, is smooth and hence the gradient and Hessian are continuous at and beyond the constraints. For each iteration of the Method of Multipliers the penalty is increased until it becomes impossible for the parameter vector to be in violation of the limits. This approach allows the parameter vector θ outside the limits yet the successive iterations ensure that the final results will not be in violation of the constraint.

For inequality constraints, each iteration of the Method of Multipliers attempts to solve the quadratic sub-problem

$\displaystyle \min_{\theta}^{}$ $\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) \stackrel{\mathrm{def}}{=} f(\theta) + \sum_{i \in \mathfrak{I}} \Psi(c_i(\theta), \lambda_i^k; \mu_k),$

(14.11)

Ψ(c_i(θ), λ^k;μ_k) = $\begin{displaymath}\begin{cases}-\lambda^k c_i(\theta) + \frac{1}{2\mu_k} c_i^... ...\frac{\mu_k}{2} (\lambda^k)^2 & \textrm{otherwise}. \end{cases}\end{displaymath}$

(14.12)

In (14.11), θ is the parameter vector; $\mathfrak{L}_A$ is the Augmented Lagrangian function; k is the current iteration of the Method of Multipliers; λ^k are the Lagrange multipliers which are positive factors such that, at the minimum $\hat{\theta}$ , ∇f ( $\hat{\theta}$ ) = λ_i∇c_i( $\hat{\theta}$ ); μ_k > 0 is the penalty parameter which decreases to zero as k→∞; $\mathfrak{I}$ is the set of inequality constraints; and c_i(θ) is an individual constraint value. The Lagrange multipliers are updated using the formula

λ_i^k+1 = max(λ_i^k - c_i(θ)/μ_k, 0), foralli∈ $\displaystyle \mathfrak{I}.$

(14.13)

∇ $\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) = \nabla f(\theta) - \su... ...i^k} \left( \lambda_i^k - \frac{c_i(\theta)}{\mu_k} \right) \nabla c_i(\theta),$

(14.14)

∇² $\displaystyle \mathfrak{L}_A(\theta, \lambda^k; \mu_k) = \nabla^2 f(\theta) + \... ...( \lambda_i^k - \frac{c_i(\theta)}{\mu_k} \right) \nabla^2 c_i(\theta) \right].$

(14.15)

The Augmented Lagrangian algorithm can accept any set of three arbitrary constraint functions c(θ), ∇c(θ), and ∇²c(θ). When given the current parameter values c(θ) returns a vector of constraint values whereby each position corresponds to one of the model parameters. The constraint is defined as c_i $\geqslant$ 0. The function ∇c(θ) returns the matrix of constraint gradients and ∇²c(θ) is the constraint Hessian function which should return the 3D matrix of constraint Hessians.

A more specific set of constraints accepted by the Method of Multipliers are bound constraints. These are defined by the function

l $\displaystyle \leqslant$ θ $\displaystyle \leqslant$ u,

(14.16)

where l and u are the vectors of lower and upper bounds respectively and θ is the parameter vector. For example for model-free model m4 to place lower and upper bounds on the order parameter and lower bounds on the correlation time and chemical exchange parameters, the vectors are

$\displaystyle \begin{pmatrix} 0 \\ 0 \\ 0 \\ \end{pmatrix}$ $\displaystyle \leqslant$ $\displaystyle \begin{pmatrix} S^2 \\ \tau_e \\ R_{ex} \\ \end{pmatrix}$ $\displaystyle \leqslant$ $\displaystyle \begin{pmatrix} 1 \\ \infty \\ \infty \\ \end{pmatrix}$ .

(14.17)

The default setting in the program relax is to use linear constraints which are defined as

A⋅θ $\displaystyle \geqslant$ b,

(14.18)

where A is an m×n matrix where the rows are the transposed vectors a_i of length n; the elements of a_i are the coefficients of the model parameters; θ is the vector of model parameters of dimension n; b is the vector of scalars of dimension m; m is the number of constraints; and n is the number of model parameters.

In rearranging (14.18) the linear constraint function c(θ) returns the vector A⋅θ - b. Because of the linearity of the constraints the gradient and Hessian are greatly simplified. The gradient ∇c(θ) is simply the matrix A and the Hessian ∇²c(θ) is zero.