mailRe: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on August 25, 2014 - 10:03:
Hi Troels,

This second tread I'm starting at
http://thread.gmane.org/gmane.science.nmr.relax.devel/6807 is to cover
the error estimation algorithm.  This is quite a different problem and
may require a different solution.  For error estimation, there are
quite a number of algorithms available.  But you should note that
Monte Carlo simulations is the gold standard by which all other
techniques are judged.  For many optimisation problems in relax, the
optimisation space is so convoluted that Monte Carlo sims is the only
reasonable solution.  The relaxation dispersion model spaces are such
convoluted problems, so I would use MC sims for these as well.

You may wish to look at
https://en.wikipedia.org/wiki/Propagation_of_uncertainty for an
overview.  You may even find that in the table in
https://en.wikipedia.org/wiki/Propagation_of_uncertainty#Example_formulas,
that the exact error equation is present.  This this assumes equal
errors for all points, but the reference given could have the answer
you are after.  Exact error equations will be the fastest of all
solutions.

The technique you are using here, the covariance matrix estimate, is a
very rough error estimate.  This can suffer from a number of issues,
especially when the optimisation space is not quadratic in shape.
Most problems in relax will not even be close to quadratic.  However
these simple exponential curves may be an exception.  The exponential
curve problem is incredibly simple and well behaved.  So in this case,
the covariance matrix estimate will probably be a good estimate of the
true errors and give similar results to the MC simulations.

The solution for this problem might depend on the solution for the
optimisation problem
(http://thread.gmane.org/gmane.science.nmr.relax.devel/6807/focus=6808).
But maybe we could create a error_analysis.covariance user function.
This would then call the target function, or do what ever else is
required to generate the matrix.  Note that it's a simple enough
algorithm to add to a new module in the relax library.  This new user
function could then be used for all analysis types as a quick and
dirty alternative to Monte Carlo simulations.  The Numerical Recipes
books are a good reference for the algorithm - but note that you
cannot use their example code due to licensing issues.  There is some
GPL code at http://cran.r-project.org/web/packages/ which might be of
interest.  But coding it yourself - with unit tests - might be the
fastest and safest option.  If the covariance matrix function is put
into the relax library, that would make the user function independent
scipy and also not require a call to the optimisation user functions -
both of which would be useful.  What are your thoughts?

If you don't mind about the accuracy of the error estimates, you could
also use the covariance matrix technique for the dispersion models as
well for quick estimate for mass screening exercises.  But for an
analysis of real data, Monte Carlo simulations is a must.

Cheers,

Edward


P. S.  Just for reference, Jackknifing is another error estimation
technique.  But this is used when you don't have errors in your input
data.  Some people also use Bootstrapping, but this is a fatal mistake
as Bootstrapping is not an error estimation technique, even though the
number looks like the real error.



On 24 August 2014 17:56, Troels E. Linnet
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
URL:
  <http://gna.org/task/?7822>

                 Summary: Implement user function to estimate R2eff and
associated errors for exponential curve fitting.
                 Project: relax
            Submitted by: tlinnet
            Submitted on: Sun 24 Aug 2014 03:56:36 PM UTC
         Should Start On: Sun 24 Aug 2014 12:00:00 AM UTC
   Should be Finished on: Sun 24 Aug 2014 12:00:00 AM UTC
                Category: relax's source code
                Priority: 5 - Normal
                  Status: In Progress
        Percent Complete: 0%
             Assigned to: tlinnet
             Open/Closed: Open
         Discussion Lock: Any
                  Effort: 0.00

    _______________________________________________________

Details:

A verification script, showed that using scipy.optimize.leastsq reaches the
exact same parameters as minfx for exponential curve fitting.

The verification script is in:
test_suite/shared_data/curve_fitting/profiling/profiling_relax_fit.py
test_suite/shared_data/curve_fitting/profiling/verify_error.py

The profiling script shows that a 10 X increase in speed can be reached by
removing
the linear constraints when using minfx.

The profiling also shows that scipy.optimize.leastsq is 10X as fast as using
minfx, even without linear constraints.

scipy.optimize.leastsq is a wrapper around wrapper around MINPACK's lmdif 
and
lmder algorithms.

MINPACK is a FORTRAN90 library which solves systems of nonlinear equations, 
or
carries out the least squares minimization of the residual of a set of 
linear
or nonlinear equations.

 The verification script also shows, that a very heavy and time consuming
monte carlo simulation of 2000 steps, reaches the same errors as the errors
reported by scipy.optimize.leastsq.

The return from scipy.optimize.leastsq, gives the estimated co-variance.
Taking the square root of the co-variance corresponds with 2X error reported
by minfx after 2000 Monte-Carlo simulations.

This could be an extremely time saving step, when performing model fitting 
in
R1rho, where the errors of the R2eff values, are estimated by Monte-Carlo
simulations.

The following setup illustrates the problem.
This was analysed on a: MacBook Pro, 13-inch, Late 2011.
With no multi-core setup.

Script running is:
test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/2_pre_run_r2eff.py

This script analyses just the R2eff values for 15 residues.
It estimates the errors of R2eff based on 2000 Monte Carlo simulations.
For each residues, there is 14 exponential graphs.

The script was broken after 35 simulations.
This was measured to 20 minutes.
So 500 simulations would take about 4.8 Hours.

The R2eff values and errors can by scipy.optimize.leastsq can instead be
calculated in: 15 residues * 0.02 seconds = 0.3 seconds.




    _______________________________________________________

Reply to this item at:

  <http://gna.org/task/?7822>

_______________________________________________
  Message sent via/by Gna!
  http://gna.org/


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Mon Aug 25 11:20:16 2014