mailRe: Comparison of Monte Carlo simulations vs. covariance matrix.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Troels Emtekær Linnet on September 01, 2014 - 16:22:
I totally agree.

Except, that I have an index on both times as well.
The same way you have an index on I.

Imagine you have three time points on an exponential curve.
Would it not then just be the sum?

So, the question is, if the error is overestimated.

But I guess, that this is the only reasonable thing you can do.
Since leaving out the error from the initial intensity would be rubbish.

And since you have simulated it with a large range of MC simulations,
I think this case can be closed.

It is though a good exercise, when expanding the problem for
dispersion models, where it
will get more tricky.

Best
Troels


Troels Emtekær Linnet


2014-09-01 15:55 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:
Anyway, there is a lot of rubbish used in the field, especially the
Ishima and Torshia 2005 equation.  In a private email someone
suggested that relax was calculating the errors incorrectly and that
the Ishima and Torshia equation was the correct one (and it resulted
in smaller errors).  That is why I simulated it in
test_suite/shared_data/dispersion/error_testing/simulation.py and
included the figure in the manual, just to prove that the Ishima &
Torchia suggested equation was wrong, despite the lower errors.

Note that the Ishima and Torchia lower error estimate, which is an
bug/error, is used by a number of other dispersion software programs
out there.  I've just forgotten which ones.

I think that this equation is a textbook case of the log of the ratio
of two normal distributions.  Lets see, we have

  R2eff * T_relax = -log(I1/I0).

This is from http://www.nmr-relax.com/manual/R2eff_model.html.  Using the 
rule

  log(X/Y) = log(X) - log(Y),

then

  R2eff * T_relax = log(I0) - log(I1),
  R2eff = T_relax^-1 (log(I0) - log(I1)).

Using the estimate from
https://en.wikipedia.org/wiki/Propagation_of_uncertainty that:

  if f = aln(A),
  then sigma_f^2 = (a * sigma_A / A)^2.

Then

  sigma_R2eff^2 = (T_relax^-1 * sigma_I0 / I0)^2 + (T_relax^-1 *
sigma_I1 / I1)^2.

Rearranging we have:

  sigma_R2eff = T_relax^-1 * sqrt((sigma_I0 / I0)^2 + (sigma_I1 / I1)^2).

This is the equation used by relax (equation 11.4 in the manual,
http://www.nmr-relax.com/manual/R2eff_model.html).  That was easier to
derive than I imagined :)

Regards,

Edward

On 1 September 2014 15:16, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> wrote:
Hi Edward.

With my scribbles, and notes from my statistics book, I derive almost
the same thing.

My problem is that I end up with also wanting the time for the reference 
point.

And that does not make sense.

It reminds me of the NS R1rho equation problem, with 1/T.

Best
Troels


Troels Emtekær Linnet


2014-09-01 15:08 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:
Ok, I've found one reference for this formula:

    - Loria, J. P. and Kempf, J. G. (2004)  Measurement of
Intermediate Exchange Phenomena.  Methods in Molecular Biology, 278,
185-231.  (http://dx.doi.org/10.1385/1-59259-809-9:185).

Specifically section 3.2.7.1 "Two-Point Data Sampling", and equation 22:

    R2(tau_CP^-1) = t^-1 ln[ave(I0) / ave(I(t))] +/- t^-1 Delta_Q,

and equation 23:

    Delta_Q = sqrt(Delta_I0^2 + Delta_I(t)^2).

This is apparently from the book:

    - Shoemaker, D. P., Garland, C. W., and Nibler, J. W. (1989)
Experiments in Physical Chemistry. 5th ed. McGraw-Hill, New York.

Though I don't have access to that to check.

Regards,

Edward



On 1 September 2014 10:55, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi Troels,

The 2-point exponential error formula is currently equation 11.4 in
the relax manual (http://www.nmr-relax.com/manual/R2eff_model.html).
I unfortunately did not write down a reference for it.  But the
equation is in a number of dispersion papers - I just have to find it.
Feel free to search yourself.  I also simulated this error, see figure
11.1 in the relax manual
(http://www.nmr-relax.com/manual/R2eff_model.html#fig:_dispersion_error_comparison)

Regards,

Edward

On 30 August 2014 13:49, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> wrote:
Hi Edward.

I found this old post in gwing gwong doogle Google.
http://thread.gmane.org/gmane.science.nmr.relax.scm/17553

And then I see, that for a two point exponential, this i just
converting the values to a linear problem.

For several time points, a good initial guess for estimating r2eff,
and i0, by converting to a linear problem, and solving by linear least
squares.
(This is currently the experimental 'estimate_x0_exp' in the R2eff
estimate module).
Maybe I should try some weighting on this.

---------
# Convert to linear problem.
# Func
# I = i0 exp(-t R)
# Convert to linear
# ln(I) = R (-t) + ln(i0)
# Compare
# f(I) = a x + b

ln_I = log(I)
x = - 1. * times
n = len(times)

# Solve by linear least squares.
a = (sum(x*ln_I) - 1./n * sum(x) * sum(ln_I) ) / ( sum(x**2) - 1./n *
(sum(x))**2 )
b = 1./n * sum(ln_I) - b * 1./n * sum(x)

# Convert back from linear to exp function. Best estimate for parameter.
r2eff_est = a
i0_est = exp(b)
-----------

And then I see in And then I look in lib.dispersion.two_point.py

---------------
"""Calculate the R2eff/R1rho error for the fixed relaxation time data.

The formula is::

                            __________________________________
                  1        / / sigma_I1 \ 2     / sigma_I0 \ 2
    sigma_R2 = -------    /  | -------- |   +   | -------- |
               relax_T  \/   \ I1(nu1)  /       \    I0    /

where relax_T is the fixed delay time, I0 and sigma_I0 are the
reference peak intensity and error when relax_T is zero, and I1 and
sigma_I1 are the peak intensity and error in the spectrum of interest.
-------------

Right now, I don't know where that comes from.

My reference book:
An introduction to Error Analysis, Second Edition
John R. Taylor
http://www.uscibooks.com/taylornb.htm

"You will not be surprised to learn that when the uncertainties ...
are independent and random, the sum can be replaced by a sum in
quadrature."

So, just following you analogy, I could get sigma R2 this way.

I will look into it and make some tests scripts.




Troels Emtekær Linnet


2014-08-30 9:54 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:
Hi,

I don't have much time to reply now, but the key is it use simple
synthetic noise-free data.  Try converting the 5 intensities in
test_suite/shared_data/curve_fitting/numeric_topology/ into 5 Sparky
peak lists with a single spin.  Then test the Monte Carlo simulations
and covariance matrix user functions in relax.  These two relax
techniques should then match the numeric results from the super-basic
scripts in that directory!  This would then be converted into two
system tests.

This was my plan, to complete in my spare time.  If you want to go
quickly, then feel free to follow these steps yourself rather than
waiting for me to do it.  I actually suggested this synthetic data
testing earlier to you
(http://thread.gmane.org/gmane.science.nmr.relax.devel/6807/focus=6840).
Synthetic noise-free data is an essential tool for implemented and
debugging any new analysis type, algorithm, protocol, etc.  The key is
that you know the answer you are searching for!  And synthetic data is
simple.  Nothing should ever be implemented and debugged using real
data, as a good looking result might be the consequence of a nasty
bug.

Regards,

Edward






On 30 August 2014 02:49, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> 
wrote:
Hm.

The last idea I have, is the division by number of degree of freedom.

So either 5-2, or 4-2.

That should be verified by a script with many different time points.

But then the errors for intensity gets very different.

Hm.

On 30 Aug 2014 01:45, "Troels Emtekær Linnet" <tlinnet@xxxxxxxxxxxxx> 
wrote:

The sentence:

"then the covariance matrix above gives the statistical error on the
best-fit parameters resulting from the Gaussian errors 'sigma_i' on
the underlying data 'y_i'."

And here I note the wording:
"statistical error"
"Gaussian errors"

Best
Troels


2014-08-29 21:21 GMT+02:00 Troels Emtekær Linnet 
<tlinnet@xxxxxxxxxxxxx>:
Hi Edward.

I also think it is some math some where.

I have a feeling, that it is the creating of Monte Carlo data with 
2
sigma?
and then some log/exp calculation of R2eff.

If the errors are 2 x times over estimated, the chi2 values are 4 
as
small, and the
space should be the same?

best
Troels

2014-08-29 17:06 GMT+02:00 Edward d'Auvergne 
<edward@xxxxxxxxxxxxx>:
I've just added the 2D Grace plots for this to the repository 
(r25444,
http://article.gmane.org/gmane.science.nmr.relax.scm/23194).  
They are
also attached to the task for easier access
(https://gna.org/task/index.php?7822#comment107).  From these 
plots I
see that the I0 error appears to be reasonable, but that the R2eff
errors are overestimated by 1.9555.

The plots are very, very different.  It is clear that
chi2_jacobian=True just gives rubbish.  It is also clear that 
there is
a perfect correlation in R2eff when chi2_jacobian=False.  The plot
shows absolutely no scattering, therefore this indicates a crystal
clear mathematical error somewhere.  I wonder where that could 
be.  It
may not be a factor of 2, as the correlation is 1.9555.  So it 
might
be a bug that is more complicated.  In any case, overestimating 
the
errors by ~2 and performing a dispersion analysis is not possible.
This will significantly change the curvature of the optimisation 
space
and will also have a huge effect on statistical comparisons 
between
models.

Regards,

Edward



On 29 August 2014 16:56, Troels Emtekær Linnet 
<tlinnet@xxxxxxxxxxxxx>
wrote:
The default is now chi2_jacobian=False.

You will hopefully see, that the errors are double.

Best
Troels

2014-08-29 16:43 GMT+02:00 Edward d'Auvergne 
<edward@xxxxxxxxxxxxx>:
Terrible ;)  For R2eff, the correlation is 2.748 and the points 
are
spread out all over the place.  For I0, the correlation is 3.5 
and
the
points are also spread out everywhere.  Maybe I should try with 
the
change from:

relax_disp.r2eff_err_estimate(chi2_jacobian=True)

to:

relax_disp.r2eff_err_estimate(chi2_jacobian=False)

How should this be used?

Cheers,

Edward



On 29 August 2014 16:33, Troels Emtekær Linnet
<tlinnet@xxxxxxxxxxxxx> wrote:
Do you mean terrible or double?

Best
Troels

2014-08-29 16:15 GMT+02:00 Edward d'Auvergne 
<edward@xxxxxxxxxxxxx>:
Hi Troels,

I really cannot follow and judge how the techniques compare.  
I
must
be getting old.  So to remedy this, I have created the

test_suite/shared_data/dispersion/Kjaergaard_et_al_2013/exp_error_analysis/
directory (r25437,
http://article.gmane.org/gmane.science.nmr.relax.scm/23187).  
This
contains 3 scripts for comparing R2eff and I0 parameters for 
the 2
parameter exponential curve-fitting:

1)  A simple script to perform Monte Carlo simulation error
analysis.
This is run with 10,000 simulations to act as the gold 
standard.

2)  A simple script to perform covariance matrix error 
analysis.

3)  A simple script to generate 2D Grace plots to visualise 
the
differences.  Now I can see how good the covariance matrix
technique
is :)

Could you please check and see if I have used the
relax_disp.r2eff_err_estimate user function correctly?  The 
Grace
plots show that the error estimates are currently terrible.

Cheers,

Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel



Related Messages


Powered by MHonArc, Updated Mon Sep 01 16:40:17 2014