mailRe: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on August 25, 2014 - 11:20:
Hi Troels,

Unfortunately you have gone ahead an implemented a solution without
first discussing or planning it.  Hence the current solution has a
number of issues:

1)  Target function replication.  The solution should have reused the
C modules.  The original Python code for fitting exponential curves
was converted to C code for speed
(http://gna.org/forum/forum.php?forum_id=1043).  Note that two point
exponentials that decay to zero is not the only way that data can be
collected, and that is the reason for Sebastien Morin's
inversion-recovery branch (which was never completed).  Anyway, the
code duplication is not acceptable.  If the C module is extended with
new features, such as having the true gradient and Hessian functions,
then the Python module will then be out of sync.  And vice-versa.  If
a bug is found in one module and fixed, it may still be present in the
second.  This is a very non-ideal situation for relax to be in, and is
the exact reason why I did not allow the cst branch to be merged back
to trunk.

2)  Scipy is now a dependency for the dispersion analysis!  Why was
this not discussed?  Coding a function for calculating the covariance
matrix is basic.  Deriving and coding the real gradient function is
also basic.  I do not understand why Scipy is now a dependency.  I
have been actively trying to remove Scipy as a relax dependency and
only had a single call for numeric quadratic intergration via QUADPACK
wrappers left to remove for the frame order analysis.  Now Scipy is
back :(

3)  If the covariance function was coded, then the specific analysis
API could be extended with a new covariance method and the
relax_disp.r2eff_estimate user function could have simply been called
error_estimate.covariance_matrix, or something like that.  Then this
new error_estimate.covariance_matrix user function could replace the
monte_carlo user functions for all analyses, as a rough error
estimator.

4)  For the speed of optimisation part of the new
relax_disp.r2eff_estimate user function, this is not because scipy is
faster than minfx!!!  It is the choice of algorithms, the numerical
gradient estimate, etc.
(http://thread.gmane.org/gmane.science.nmr.relax.scm/22979/focus=6812).

5)  Back to Scipy.  Scipy optimisation is buggy full stop.  The
developers ignored my feedback back in 2003.  I assumed that the
original developers had just permanently disappeared, and they really
never came back.  The Scipy optimisation code did not change for many,
many years.  While it looks like optimisation works, in some cases it
does fails hard, stopping in a position in the space where there is no
minimum!  I added the dx.map user function to relax to understand
these Scipy rubbish results.  And I created minfx to work around these
nasty hidden failures.  I guess such failures are due to them not
testing the functions as part of a test suite.  Maybe they have fixed
the bugs now, but I really can no longer trust Scipy optimisation.


There is another solution that avoids this, and which would have been
a much better solution:

a)  Deriving and coding the gradients (and maybe Hessian),
b)  Add a covariance function to the relax library.

This way you could simply continue using minfx, though with the faster
optimisation logarithms and have a new user function that is
independent of the analysis type and hence much more useful.  If you
did have the Hessian in the C module, then this with Newton
optimisation followed by calling error_estimate.covariance_matrix
should be even faster.  Please discuss and plan solutions on the
mailing lists before implementing them.  The quickest solution to
implement is not always the best.

Alternatively, a quick solution for a once-off problem could be coded
into a temporary branch, but that branch is then not merged back into
the main line.

Cheers,

Edward




On 24 August 2014 17:56, Troels E. Linnet
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
URL:
  <http://gna.org/task/?7822>

                 Summary: Implement user function to estimate R2eff and
associated errors for exponential curve fitting.
                 Project: relax
            Submitted by: tlinnet
            Submitted on: Sun 24 Aug 2014 03:56:36 PM UTC
         Should Start On: Sun 24 Aug 2014 12:00:00 AM UTC
   Should be Finished on: Sun 24 Aug 2014 12:00:00 AM UTC
                Category: relax's source code
                Priority: 5 - Normal
                  Status: In Progress
        Percent Complete: 0%
             Assigned to: tlinnet
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________

Details:

A verification script, showed that using scipy.optimize.leastsq reaches the
exact same parameters as minfx for exponential curve fitting.

The verification script is in:
test_suite/shared_data/curve_fitting/profiling/profiling_relax_fit.py
test_suite/shared_data/curve_fitting/profiling/verify_error.py

The profiling script shows that a 10 X increase in speed can be reached by
removing
the linear constraints when using minfx.

The profiling also shows that scipy.optimize.leastsq is 10X as fast as using
minfx, even without linear constraints.

scipy.optimize.leastsq is a wrapper around wrapper around MINPACK's lmdif 
and
lmder algorithms.

MINPACK is a FORTRAN90 library which solves systems of nonlinear equations, 
or
carries out the least squares minimization of the residual of a set of 
linear
or nonlinear equations.

 The verification script also shows, that a very heavy and time consuming
monte carlo simulation of 2000 steps, reaches the same errors as the errors
reported by scipy.optimize.leastsq.

The return from scipy.optimize.leastsq, gives the estimated co-variance.
Taking the square root of the co-variance corresponds with 2X error reported
by minfx after 2000 Monte-Carlo simulations.

This could be an extremely time saving step, when performing model fitting 
in
R1rho, where the errors of the R2eff values, are estimated by Monte-Carlo
simulations.

The following setup illustrates the problem.
This was analysed on a: MacBook Pro, 13-inch, Late 2011.
With no multi-core setup.

Script running is:
test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/2_pre_run_r2eff.py

This script analyses just the R2eff values for 15 residues.
It estimates the errors of R2eff based on 2000 Monte Carlo simulations.
For each residues, there is 14 exponential graphs.

The script was broken after 35 simulations.
This was measured to 20 minutes.
So 500 simulations would take about 4.8 Hours.

The R2eff values and errors can by scipy.optimize.leastsq can instead be
calculated in: 15 residues * 0.02 seconds = 0.3 seconds.




    _______________________________________________________

Reply to this item at:

  <http://gna.org/task/?7822>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Mon Aug 25 13:20:14 2014