Re: r25229 - /trunk/test_suite/shared_data/curve_fitting/profiling/verify_error.py -- August 25, 2014

On 25 August 2014 01:08,  <tlinnet@xxxxxxxxxxxxx> wrote:

Author: tlinnet
Date: Mon Aug 25 01:08:46 2014
New Revision: 25229

URL: http://svn.gna.org/viewcvs/relax?rev=25229&view=rev
Log:
Added verification script, that shows that using scipy.optimize.leastsq 
reaches the exact same parameters as minfx for exponential curve fitting.

The profiling shows that scipy.optimize.leastsq is 10X as fast as using 
minfx (with no linear constraints.)


You are making the wrong assumption here as you are comparing two
different algorithms!  If you coded gradients and Hessians into the
relax C module, then the minfx Newton optimisation will very likely
eat the scipy Levenberg-Marquardt algorithm for breakfast!

scipy.optimize.leastsq is a wrapper around wrapper around MINPACK's lmdif 
and lmder algorithms.


Maybe the Python code in minfx could be replaced by a MINPACK wrapper.
But what is the key algorithm here is the numeric gradient estimate
via QUADPACK.  You need to be careful when comparing.  The relax
solution has no gradients, calculated or numerically approximated,
hence uses the slower simplex optimisation.  The scipy solution here
has a numeric gradient, which is much slower than a calculated
gradient, and because it has a gradient it can use the faster LM
algorithm.  But if you add real gradients and Hessians to the relax C
modules, then you could use the LM algorithm in minfx or, even better
and faster, the Newton algorithm.

MINPACK is a FORTRAN90 library which solves systems of nonlinear equations, 
or carries out the least squares minimization of the residual of a set of 
linear or nonlinear equations.


These will be faster than the Python code in minfx.  But the gradient,
Hessian and algorithm used are more important speed factors than
Python vs. FORTRAN.

The verification script also shows, that a very heavy and time consuming 
monte carlo simulation of 2000 steps, reaches the same
errors as the errors reported by scipy.optimize.leastsq.

The return from scipy.optimize.leastsq, gives the estimated co-variance.
Taking the square root of the co-variance corresponds with 2X error 
reported by minfx.

This could be an extremely time saving step, when performing model fitting 
in R1rho, where
the errors of the R2eff values, are estimited by Monte-Carlo simulations.


See http://thread.gmane.org/gmane.science.nmr.relax.devel/6807/focus=6809
for a detailed discussion of this technique and discussion of a
possible implementation.

The following setup illustrates the problem.
This was analysed on a: MacBook Pro, 13-inch, Late 2011.
Witn no multi-core setup.

Script running is:
test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/2_pre_run_r2eff.py

This script analyses just the R2eff values for 15 residues.
It estimates the errors of R2eff based on 2000 Monte Carlo simulations.
For each residues, there is 14 exponential graphs.

The script was broken after 35 simulations.
This was measured to 20 minutes.
So 500 simulations would take about 4.8 Hours.
The R2eff values and errors can by scipy.optimize.leastsq can instead be 
calculated in: 15 residues * 0.02 seconds = 0.3 seconds.


That is rather a large time saving for this problem!

Regards,

Edward

Re: r25229 - /trunk/test_suite/shared_data/curve_fitting/profiling/verify_error.py

Header

Content

Related Messages