On 25 August 2014 01:08, <tlinnet@xxxxxxxxxxxxx> wrote:
Author: tlinnet Date: Mon Aug 25 01:08:46 2014 New Revision: 25229 URL: http://svn.gna.org/viewcvs/relax?rev=25229&view=rev Log: Added verification script, that shows that using scipy.optimize.leastsq reaches the exact same parameters as minfx for exponential curve fitting. The profiling shows that scipy.optimize.leastsq is 10X as fast as using minfx (with no linear constraints.)
You are making the wrong assumption here as you are comparing two different algorithms! If you coded gradients and Hessians into the relax C module, then the minfx Newton optimisation will very likely eat the scipy Levenberg-Marquardt algorithm for breakfast!
scipy.optimize.leastsq is a wrapper around wrapper around MINPACK's lmdif and lmder algorithms.
Maybe the Python code in minfx could be replaced by a MINPACK wrapper. But what is the key algorithm here is the numeric gradient estimate via QUADPACK. You need to be careful when comparing. The relax solution has no gradients, calculated or numerically approximated, hence uses the slower simplex optimisation. The scipy solution here has a numeric gradient, which is much slower than a calculated gradient, and because it has a gradient it can use the faster LM algorithm. But if you add real gradients and Hessians to the relax C modules, then you could use the LM algorithm in minfx or, even better and faster, the Newton algorithm.
MINPACK is a FORTRAN90 library which solves systems of nonlinear equations, or carries out the least squares minimization of the residual of a set of linear or nonlinear equations.
These will be faster than the Python code in minfx. But the gradient, Hessian and algorithm used are more important speed factors than Python vs. FORTRAN.
The verification script also shows, that a very heavy and time consuming monte carlo simulation of 2000 steps, reaches the same errors as the errors reported by scipy.optimize.leastsq. The return from scipy.optimize.leastsq, gives the estimated co-variance. Taking the square root of the co-variance corresponds with 2X error reported by minfx. This could be an extremely time saving step, when performing model fitting in R1rho, where the errors of the R2eff values, are estimited by Monte-Carlo simulations.
See http://thread.gmane.org/gmane.science.nmr.relax.devel/6807/focus=6809 for a detailed discussion of this technique and discussion of a possible implementation.
The following setup illustrates the problem. This was analysed on a: MacBook Pro, 13-inch, Late 2011. Witn no multi-core setup. Script running is: test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/2_pre_run_r2eff.py This script analyses just the R2eff values for 15 residues. It estimates the errors of R2eff based on 2000 Monte Carlo simulations. For each residues, there is 14 exponential graphs. The script was broken after 35 simulations. This was measured to 20 minutes. So 500 simulations would take about 4.8 Hours. The R2eff values and errors can by scipy.optimize.leastsq can instead be calculated in: 15 residues * 0.02 seconds = 0.3 seconds.
That is rather a large time saving for this problem! Regards, Edward