Re: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting. -- August 25, 2014

Hi Troels,

Please see below:

On 25 August 2014 13:01, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:

2014-08-25 11:19 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi Troels,

Unfortunately you have gone ahead an implemented a solution without
first discussing or planning it.  Hence the current solution has a
number of issues:

1)  Target function replication.  The solution should have reused the
C modules.  The original Python code for fitting exponential curves
was converted to C code for speed
(http://gna.org/forum/forum.php?forum_id=1043).  Note that two point
exponentials that decay to zero is not the only way that data can be
collected, and that is the reason for Sebastien Morin's
inversion-recovery branch (which was never completed).  Anyway, the
code duplication is not acceptable.  If the C module is extended with
new features, such as having the true gradient and Hessian functions,
then the Python module will then be out of sync.  And vice-versa.  If
a bug is found in one module and fixed, it may still be present in the
second.  This is a very non-ideal situation for relax to be in, and is
the exact reason why I did not allow the cst branch to be merged back
to trunk.


Hi Edward.

I prefer not to make this target function dependent on C-code compilation.

Compilation of code on windows can be quite a hairy thing.

For example see:
http://wiki.nmr-relax.com/Installation_windows_Python_x86-32_Visual_Studio_Express_for_Windows_Desktop#Install_Visual_Studio_Express_2012_for_Windows_Desktop

Visua Studio Express is several hundreds of megabyte installation, for
just compiling an exponential curve. ?
This is way, way overkill for this situation.


The C code compilation has been a requirement in relax since 2006.
This was added not only for speed, but as a framework to copy for
other analysis types in the future.  Once a Python target function has
been fully optimised, for the last speed up the code can be converted
to C.  This is the future plan for a number of the relax analyses.
But first the Python code is used for prototyping and for finding the
fastest implementation/algorithm.

The C compilation will become an even greater requirement once I write
C wrapper code for QUADPACK to eliminate the last dependencies on
Scipy.  And the C compilation framework allows for external C and
FORTRAN libraries to be added to the 'extern' package in the future,
as there are plenty of open source libraries out there with compatible
licences which could be very useful to use within relax.

2)  Scipy is now a dependency for the dispersion analysis!  Why was
this not discussed?  Coding a function for calculating the covariance
matrix is basic.  Deriving and coding the real gradient function is
also basic.  I do not understand why Scipy is now a dependency.  I
have been actively trying to remove Scipy as a relax dependency and
only had a single call for numeric quadratic intergration via QUADPACK
wrappers left to remove for the frame order analysis.  Now Scipy is
back :(


Hi Edward.

Scipy is a dependency for trying calculation with scipy.optimize.leastsq.

How could it be anymore different?

What you are aiming at, is to add yet another feature for estimating the 
errors.
A third solution.

What ever the third solution would come up with of dependency, would
depend on the method implemented.
One could also possible imagine to extend this procedures in R, Matlab
or whatever.

Byt they would also need to meet some dependencies.

Of course the best solution would always try to make relax most independent.

But if the desire is to try with scipy.optimize.leastsq, then you are
bound with this dependency.


That's why I asked if only the covariance matrix is required.  Then we
can replace the use of scipy.optimize.leastsq() with a single function
for calculating the covariance matrix.

3)  If the covariance function was coded, then the specific analysis
API could be extended with a new covariance method and the
relax_disp.r2eff_estimate user function could have simply been called
error_estimate.covariance_matrix, or something like that.  Then this
new error_estimate.covariance_matrix user function could replace the
monte_carlo user functions for all analyses, as a rough error
estimator.


That would be the third possibility.


..., that would give the same result, save the same amount of time,
but would avoid the new Scipy dependency and be compatible with all
analysis types ;)

4)  For the speed of optimisation part of the new
relax_disp.r2eff_estimate user function, this is not because scipy is
faster than minfx!!!  It is the choice of algorithms, the numerical
gradient estimate, etc.
(http://thread.gmane.org/gmane.science.nmr.relax.scm/22979/focus=6812).


This sound good.

But I can only say, that as I user I meet a "big wall of time
consumption", for the error
estimation of R2eff via Monte-Carlo.

As a user, I needed more options to try out.


The idea of adding the covariance matrix error estimate to relax is a
great idea.  Despite its lower quality, it is hugely faster than Monte
Carlo simulations.  It has been considered it before, see
http://thread.gmane.org/gmane.science.nmr.relax.user/602/focus=629 and
the discussions in that thread.  But the time required for Monte Carlo
simulations was never an issue so the higher quality estimate remained
the only implementation.

What I'm trying to do, is to direct your solution to be general and
reusable.  I'm also thinking of other techniques at the same time,
Jackknife simulations for example, which could be added in the future
by developers with completely different interests.

5)  Back to Scipy.  Scipy optimisation is buggy full stop.  The
developers ignored my feedback back in 2003.  I assumed that the
original developers had just permanently disappeared, and they really
never came back.  The Scipy optimisation code did not change for many,
many years.  While it looks like optimisation works, in some cases it
does fails hard, stopping in a position in the space where there is no
minimum!  I added the dx.map user function to relax to understand
these Scipy rubbish results.  And I created minfx to work around these
nasty hidden failures.  I guess such failures are due to them not
testing the functions as part of a test suite.  Maybe they have fixed
the bugs now, but I really can no longer trust Scipy optimisation.


I am sorry to hear about this.

And I am totally convinced that minfx is better for minimising the
dispersion models.
You have proven that quite well in your papers.

I do though have a hard time believing that minimisation of an
exponential function should be
subject to erroneous results.

Anyway, this is still left to "freedom of choice" for the user.


The error in the original Scipy optimisation code was causing quite
different results.  The 3 algorithms, now that I look back at my
emails from 2003, are:

- Nelder-Mead simplex,
- Levenberg-Marquardt,
- NCG.

These are still all present in Scipy, though I don't know if the code
is different from back in 2003.  The error in the Levenberg-Marquardt
algorithm was similar to the Modelfree4 problem, in that a lamba
matrix updating condition was incorrectly checked for.  When the
gradient was positive, i.e. up hill, the matrix should update and the
algorithm continue to try to find a downhill step.  If the conditions
are not correctly checked for, the algorithm thinks that the up hill
step means that it is at the minimum.  But this is not the case, it is
just pointing in the wrong direction.  I don't remember what the NCG
bug was, but that one was much more severe and the results were
strange.

Failures of optimisation algorithms due to bugs can be quite random.
And you often don't see them, as you don't know what the true result
really is.  But such bugs will affect exponential functions, despite
their simplicity.

Regards,

Edward

Re: [task #7822] Implement user function to estimate R2eff and associated errors for exponential curve fitting.

Header

Content

Related Messages