mailRe: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on August 25, 2014 - 10:06:
Hi Troels,

On the subject of optimisation, if you use the scipy.optimise.fmin()
function in the profiling, you will be able to compare the speed of
the Nelder-Mead simplex algorithm in minfx vs. scipy.  These should
however be similar.

Regards,

Edward



On 25 August 2014 09:32, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi Troels,

We should probably discuss how you would like to implement this in
relax, prior to writing too much code.  That way we can develop the
code in the best direction from the start and save a -lot- of time.
The way I see it is that there are two separate parts to this task:

1)  Optimisation algorithm.

2)  Error estimation algorithm.

In this thread, I'll cover 1) and then write a second message about 2).

For the scipy.optimize.leastsq() function, the Levenberg-Marquardt
optimisation algorithm is being used.  See
https://en.wikipedia.org/wiki/Levenberg%E2%80%93Marquardt_algorithm,
though this Wikipedia description is terrible at the moment and a
better reference would be the book Numerical Recipes in
C/Fortran/Pascal/etc.  This algorithm is also available in minfx
http://home.gna.org/minfx/minfx.levenberg_marquardt-module.html.  For
the algorithm, gradients are required.  These are converted into a
special matrix known as the Jacobian.  You can see this as the Dfun
argument to the scipy.optimise.leastsq() function.  If you do not
supply Dfun, then the algorithm uses a much, much, much slower
technique - the numeric approximation of the gradient.  As from the
docs:

    Dfun : callable
        A function or method to compute the Jacobian of func with 
derivatives
        across the rows. If this is None, the Jacobian will be estimated.

So, if you derive the gradient equations and add these to the relax C
module, you will have a much greater speed up as estimating the
gradients will be the majority of the computation time for this
algorithm as you are currently using it.  These equations are
incredibly basic - just take the partial derivative of the exponential
equation with respect to each parameter.  The floating point value for
each parameter derivative is them packed into an array with the same
dimensions as the parameter vector.  This is the gradient.  As an
aside, the Hessian is similar but is second partial derivatives in a
matrix form, and using this together with the even faster Newton
algorithm is really the absolute ultimate solution (this powerful
algorithm should probably only need 2-3 steps of optimisation to solve
this simple problem).

The key part that is missing from minfx is the numerical gradient
approximation.  However deriving the gradient is usually such a basic
exercise, in this case it is ridiculously simple, that gives so much
more speed that I've never bothered with numerical approximations in
minfx.  I also did not derive or implement the gradient for the
exponential curve-fitting as the code was fast enough for my purposes.
To use gradient functions with minfx, there are the dchi2_func and
dfunc arguments which then constructs the Jacobian form for you that
the Dfun scipy argument expects.

Anyway, that was just the background to all of this.  How do you see
this being implemented in relax?  All of the optimisation is via minfx
(https://gna.org/projects/minfx/), so maybe you should sign up to that
Gna! project as well.  Remember that minfx was just a normal relax
package that I separated into its own project, just like bmrblib
(https://gna.org/projects/bmrblib/).

Regards,

Edward


On 24 August 2014 17:56, Troels E. Linnet
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
URL:
  <http://gna.org/task/?7822>

                 Summary: Implement user function to estimate R2eff and
associated errors for exponential curve fitting.
                 Project: relax
            Submitted by: tlinnet
            Submitted on: Sun 24 Aug 2014 03:56:36 PM UTC
         Should Start On: Sun 24 Aug 2014 12:00:00 AM UTC
   Should be Finished on: Sun 24 Aug 2014 12:00:00 AM UTC
                Category: relax's source code
                Priority: 5 - Normal
                  Status: In Progress
        Percent Complete: 0%
             Assigned to: tlinnet
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________

Details:

A verification script, showed that using scipy.optimize.leastsq reaches the
exact same parameters as minfx for exponential curve fitting.

The verification script is in:
test_suite/shared_data/curve_fitting/profiling/profiling_relax_fit.py
test_suite/shared_data/curve_fitting/profiling/verify_error.py

The profiling script shows that a 10 X increase in speed can be reached by
removing
the linear constraints when using minfx.

The profiling also shows that scipy.optimize.leastsq is 10X as fast as 
using
minfx, even without linear constraints.

scipy.optimize.leastsq is a wrapper around wrapper around MINPACK's lmdif 
and
lmder algorithms.

MINPACK is a FORTRAN90 library which solves systems of nonlinear 
equations, or
carries out the least squares minimization of the residual of a set of 
linear
or nonlinear equations.

 The verification script also shows, that a very heavy and time consuming
monte carlo simulation of 2000 steps, reaches the same errors as the errors
reported by scipy.optimize.leastsq.

The return from scipy.optimize.leastsq, gives the estimated co-variance.
Taking the square root of the co-variance corresponds with 2X error 
reported
by minfx after 2000 Monte-Carlo simulations.

This could be an extremely time saving step, when performing model fitting 
in
R1rho, where the errors of the R2eff values, are estimated by Monte-Carlo
simulations.

The following setup illustrates the problem.
This was analysed on a: MacBook Pro, 13-inch, Late 2011.
With no multi-core setup.

Script running is:
test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/2_pre_run_r2eff.py

This script analyses just the R2eff values for 15 residues.
It estimates the errors of R2eff based on 2000 Monte Carlo simulations.
For each residues, there is 14 exponential graphs.

The script was broken after 35 simulations.
This was measured to 20 minutes.
So 500 simulations would take about 4.8 Hours.

The R2eff values and errors can by scipy.optimize.leastsq can instead be
calculated in: 15 residues * 0.02 seconds = 0.3 seconds.




    _______________________________________________________

Reply to this item at:

  <http://gna.org/task/?7822>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Mon Aug 25 11:20:16 2014