Re: Relax_fit.py problem -- October 16, 2008

Hi,

So what I've been doing for NOE data is using the DSN command in vnmrj (which
gives S/N ratio) and then converting this to the rms value by RMSD = highest
peak/(S/N)

If I understand what is planned for the relax program, I would want rms values
for each of the T1 and T2 spectra at all field strengths, but I'm not sure if
it would be worthwhile to collect error information for the different delays
within a given spectrum (i.e. rms at 0.01 s and 0.05 s for a single spectrum).
In principle I would think it should be comparable, and I don't want to make
this unnecessarily complicated.

In any case, do you have an estimate for how long it will take toimplement the

base plane rms functionality for the curve-fitting portion of relax? If there
is something I can do to help feel free to let me know. Ultimately I'd like to
compare results from a few different programs, so if it will be a few weeks I
might try an alternate method first and then come back and compare the results
at various stages with relax. That said, I don't know if base plane rms will

work from the get-go with any other packages, and a lot of thealternatives may

not be user friendly.

Tyler






Quoting Edward d'Auvergne <edward.dauvergne@xxxxxxxxx>:

On Thu, Oct 16, 2008 at 1:09 AM, Chris MacRaild <macraild@xxxxxxxxxxx> wrote:


Well, the Jackknife technique
(http://en.wikipedia.org/wiki/Resampling_(statistics)#Jackknife) does
something like this.  It uses the errors present inside the collected
data to estimate the parameter errors.  It's not great, but is useful
when errors cannot be measured.  You can also use the covariance
matrix from the optimisation space to estimate errors.  Both are rough
and approximate, and in convoluted spaces (the diffusion tensor space
and double motion model-free models of Clore et al., 1990) are known
to have problems.  Monte Carlo simulations perform much better in
complex spaces.


I have used (and extensively tested) Bootstrap resampling for this
problem. In my hands it works very well provided the data quality is
high (which of course it must be if the resulting values are to be of
any use in model-free analysis). In other words it gives errors
indistinguishable from those derived by Monte Carlo based on duplicate
spectra. Bootstraping, like Jacknife, does not depend on an estimate
of peak hight uncertainty. Its success presumably reflects the smooth
and simple optimisation space involved in an exponential fit to good
data - I fully expect it to fail if applied to the complex spaces of
model-free optimisation.


If someone would like bootstrapping for a certain technique, this
could added to relax without too much problem by duplicating the Monte
Carlo code and making slight modifications.  Implementing Jackknife or
the covariance matrix for error propagation would be more complex and
questionable as to its value.  Anyway, if it's not absolutely
necessary I will concentrate my efforts on getting Gary Thompson's
multi processor code functional (to run relax on clusters, grids, or
multi-cpu systems - see
https://mail.gna.org/public/relax-devel/2006-04/msg00023.html).  And
the BMRB and CCPN integration (CCPN at
https://mail.gna.org/public/relax-devel/2007-11/msg00037.html
continued at https://mail.gna.org/public/relax-devel/2007-12/msg00000.html,
and BMRB at https://mail.gna.org/public/relax-devel/2008-07/msg00057.html).

One question I have about the bootstrapping you used Chris is, how did
you get the errors for the variance of the Gaussian distributions used
to generate the bootstrapping samples?  The bootstrapping method I
know for error analysis is very similar to Monte Carlo simulations.
For Monte Carlo simulations you have:

1)  Fit the original data set to get the fitted parameter set (this
uses the original error set).
2)  Generate the back calculated data set from the fitted parameter set.
3)  Randomise n times, assuming a Gaussian distribution, the back
calculated data set using original error set.
4)  Fit the n Monte Carlo data sets as in 1).
5)  The values of 1) and standard deviation of 4) give the final
parameter values.

The bootstrapping technique for error analysis I am familiar with is:

1)  Fit the original data set to get the fitted parameter set (this
uses the original error set).
2)  N/A.
3)  Randomise n times, assuming a Gaussian distribution, the original
data set using original error set.
4)  Fit the n bootstrapped data sets as in 1).
5)  The values of 1) and standard deviation of 4) give the final
parameter values.

Is this how you implemented it?

While on the topic, I can also confirm that baseline RMSD is a good
estimator of peak hight uncertainty. In my hands no sqrt(2) correction
is required. Interestingly, there seems to be no simple relationship
between baseline RMSD and peak volume uncertainty. I never managed to
understand why that is, but perhaps it is related to the behaviour of
noise under apodisation?


This is quite useful to know.  If you are using peak volumes for an
intensity measure and you don't have duplicate spectra, you could be
in trouble.  I have extensively played with noise and peak position
uncertainty in experiments and simulation and I would guess that for
volumes the problem is more than just the apodisation.  I think that
total spectral power; truncation; phasing; apodisation - which affects
error smoothing, truncation, peak intensity, spectral power, etc.;
zero filling (again related to spectral power); and window position
and size all have an effect here.  Well they do for the chemical shift
uncertainty in my simulations anyway - so much for Bax's LW/SN formula
for 2 coupled peaks!  This might require a full PhD project in
spectral processing to solve.

Btw, as I mentioned earlier in this thread there is still the bug in
relax's relaxation curve fitting where the standard deviations are
averaged rather than the variances!

Cheers,

Edward

Re: Relax_fit.py problem

Header

Content

Related Messages