Re: -- November 22, 2010

Dear Peter,

Sorry for the bounce messages you have been receiving, your email
address should now be on the accepted list.  I missed the bounce
warnings as these were appearing in my spam folder.  Your posts should
soon now appear in the archives
https://mail.gna.org/public/relax-devel/.  Cheers.  I have more below.


On 19 November 2010 10:40, Dr. Klaus-Peter Neidig
<peter.neidig@xxxxxxxxxxxxxxxxx> wrote:


Dear Edward,

after coming back from England, let me provide some comments.

Some of the literature use the words error and uncertainty interchangeably.
The quantity sigma used in least squares fitting would be the square of
error or uncertainty.


In relax, we use sigma as the standard deviation (or error and
uncertainty), and sigma squared is the variance.  This comes from the
normal distribution, the minus log of which creates the chi-squared
statistic, hence is divided by the variance.  Ah, I just read your
other message (https://mail.gna.org/public/relax-devel/2010-11/msg00006.html).

The pdc output refers to error, so   error_R1 = error_T1/T1


Then we should be able to read in the standard deviations directly
from the PDC file.  This will be extremely easy to implement.

Where do the errors come from ? The source of the errors is the spectrum.
Each data point has an error. Commonly this error is derived from 
signal-to-noise,
usually obtained from signal free regions.


Do the users define the signal-free region?  For example there is
usually more noise in the random coil part of the spectrum due to
unwanted junk in the sample, so the peaks in this region have larger
errors.  What about signals close to the water signal?  Is if possible
to have different errors for these peaks?

In most cases the standard deviation
is calculated from such regions then multipled with a factor. The PDC uses
multiple such regions, takes the one with the lowest sdev and multiples it 
by 2.0.
There is no unqiue definition of this factor. When used as a peak picking
threshold we often use 3.5 * sdev. The noise is calculated for each plane 
of the pseudo
3D separately.


I don't understand the origin of this scaling.  Since Art Palmer's
seminal 1991 publication (http://dx.doi.org/10.1021/ja00012a001), the
RMSD of the pure base-plane noise has been used for the standard
deviation.  The key text is on page 4375:

"The uncertainties in the measured peak heights, sigma_h, were set
equal to the root-mean-square baseline noise in the spectra. This
assumption was validated by recording two duplicate spectra with use
of the sequence of Figure 1b with T = 0.02 s. Assuming that the peak
heights of the 24 C^alpha resonances are identically distributed, the
standard deviation of the differences between the heights of
corresponding peaks in the paired spectra is equal to 2^(1/2)*sigma_h.
The value of determined in this manner was compared to the
root-mean-square baseline noise."

Tyler Reddy pointed this one out
(https://mail.gna.org/public/relax-users/2008-10/msg00026.html),
though this has been used in relax since about 2003.  This was also
used in the important Farrow et al. (1994) Biochemistry, 33: 5984-6003
article (http://dx.doi.org/10.1021/bi00185a040):

"The root-mean-square (rms) value of the background noise regions was
used to estimate the standard deviation of the measured intensities.
The duplicate spectra that were collected were used to assess the
validity of this estimate. The distribution of the difference in
intensities of identical peaks in duplicate spectra should have a
standard deviation sqrt(2) times greater than the standard deviation
of the individual peaks. This analysis was performed for the SH2 data,
and the two measures were found to agree (average discrepancy of 7%).
On this basis, it was concluded that the rms value of the noise could
be used to estimate the standard deviations of the measured
intensities."

Again this was mentioned by Tyler
(https://mail.gna.org/public/relax-users/2008-10/msg00029.html).  Has
there been advancements to these ideas?  Is the factor due to other
error sources other than spectral noise?

But this would not account for systematic errors. Therefore it is advised to
also repeat experiments and check the variation of data points from spectrum
to spectrum. Unfortunately it is up to the NMR user to do this or not and 
in most
cases we are happy to see1-3 mixing times repeated at least once. That is 
not
enough to calculate proper errors, therefore we just take the spread (max 
difference)
as an estimate and apply it to all planes.

From my work a long time ago, I noticed that the spectral errors

decreased with an increasing relaxation period.  This can be taken
into account in relax if all spectra are duplicated/triplicated/etc.
But if not, then the errors for all spectra are averaged (using
variance averaging, not standard deviation averaging).  For a single
duplicated/triplicated spectrum, the error is taken as the average
variance of each peak.  So when some, but not all, spectra are
replicated, there will be one error for all spin system at all
relaxation periods.  This sounds similar to what the PDC does, but is
it exactly the same?

For those interested, the exact code can be seen in the __error_repl()
function in 
http://svn.gna.org/viewcvs/relax/1.3/generic_fns/spectrum.py?revision=11618&view=markup&pathrev=11688.

Next the user has the freedom to calculate the peak integrals. We offer: 
just peak
intensity, peak shape integral, peak area integral and peak fit.


Does peak intensity mean peak height?  In relax, the term peak
intensity is inclusive of both volume, height, or any other measure of
the peak.  For relaxation data with proper temperature calibration and
proper temperature compensation, this is now often considered the best
method.  Peter Wright published a while back that volumes are more
accurate, but with the temperature consistency improvements the
problems Peter encountered are all solved and hence heights are better
as overlap problems and other spectral issues are avoided.  Is there a
way of automatically determining the method used in the PDC file,
because in the end the user will have to specify this for BMRB data
deposition?  This is not so important though, the user can manually
specify this if needed.

They all have their
disadvantages in many cases the peak intensity is not even the worst.
The error estimation also depend on the chosen method. In case of shape
and area integral the integral error goes down with the number of integrated
data points since at least random errors partially cancel out. In case of 
peak fit
we take the error as coming out of Levenberg Marquardt, internally based on
covariance analysis.


The covariance analysis is known to be lowest quality of all the error
estimates.  For an exponential fit, I have not checked how
non-quadratic the minimum is, so I don't know the performance of the
method.  Do the errors from that compare to the area integral together
with base-plane RMSD?

Since the number of data points in pseudo 3D spectra
is small per peak the 2D peak fit usually results in large errors. Also, 
the peak fitting
assumes line shapes as gaussian, lorentizian or mixtures of both but the 
actual peak
shape often differs from that.


As most people -should- use peak heights, this should not be too much
of an issue.

As it comes to relaxation parameter fitting we supply the peak integral 
errors to
Levenberg-Marquardt and get fitted parameters and their errors back, again 
based
on covariance analysis. Alternatively the user may run MC simulations, 
usually 1000.
The input variation of the integrals comes from a gaussian random 
generator, the
width of the gaussian distribution for each integral is taken identical to 
the estimated
error of that integral. Literature says that the error then obtained from 
MC should be identical
to the error obtained from LM.


I can't remember exactly, but I think Art Palmer published something
different to this and that is why his curvefit program uses MC as well
as Jackknife simulation.  Anyway, if the minimum has a perfectly
quadratic curvature, then the covariance matrix analysis and MC will
give identical results - the methods converge.  But as the minimum
deviates from quadratic, the performance of the covariance matrix
method drops off quickly.  MC simulations are the gold standard for
error propagation, when the direct error equation is too complex to
derive.  But again I don't know how non-quadratic the minimum is in
the exponential optimisation space, so I don't know how well the
covariance matrix performs.  The difference in techniques might be
seen for very weak exchanging peaks with large errors.  Or special
cases such as when the spectra are not properly processed and there
are truncation artifacts.

I checked this by eye, they are at least similar.
All errors, regardless from LM or MC are finally multiplied with a factor 
obtained from
a Student T distribution at given confidence level and degrees of freedom.
The number we get at the end agree with what we get from Matlab which has 
become
an internal standard for us at Bruker.


What is this factor and why is it used?  The MC simulation errors
should be exact, assuming the input peak intensity errors are exact
estimates.  MC simulations propagate the errors through the non-linear
space, and hence these errors are perfectly exact.  Is there a
reference for this post-MC sim scaling?  Is it optional?  For relax,
it would be good to know both this scaling factor and that mentioned
above for the peak intensity error scaling, so that both can be
removed prior to analysis.  These will impact model-free analysis in a
very negative way.

As it comes to model free modelling the errors obtained so far are taken 
into account.
But there is heavy criticism from some of the experts. They say for example 
that the errors
tend to be too small, especially the NOE error which is only based on the 
ratio of 2 numbers,
whereas T1 and T2 errors are based fitting of 10-20 input values. Therefore 
in the PDC we
allow tht user override the determined errors by default errors e.g. 2% for 
NOE and 1% for T1
and T2. They also say that the modelling output should contain the back 
calculated T1, T2
and NOE and should get markers if T1 and T2 are well reproduced regardless 
what NOE
is doing. I don't want to comment on this but I have implemented it in the 
PDC.


I will not comment either - there is no point ;)  Their model-free
analyses will fail in interesting ways though :S  But that is up to
the user.  It would be good to not default to such behaviour.

During the curve fitting we refer to T1, T2 for no deeper reason. The older 
Bruker software
tools did it and a module in TopSpin is for example called the T1/T2 
module. Today I'm
only a developer (in former times I could define projects by myself) and 
other people officially
tell me what to do. That is the only reason for using T1, T2. As soon as it 
goes beyond the
relaxation curve fitting everything else internally continues with R1, R2, 
most of the literature
presents the formulas with R1, R2 and I didn't want to rewrite all these 
and introduce mistakes.
Technically, it would be no problem to present everything in terms of R1, 
R2 but now several
people already use the software an nobody complained. Perhaps with a future 
version I
should just allow the user to switch between the one or other 
representation. I will discuss this with
some people here.


Ah, historic reasons.  We will be able to handle both in relax.  But
if the PDC presents T1 and T2 plots, and these values are in the PDC
output file, the user is highly likely to then use this in
publications.  Some people prefer T1 and T2 plots, but these are an
exception in publications.  This will likely lead to a convention
change in the literature (for only Bruker users) which will make it
harder to compare systems.  It would be good to consider this effect.
Cheers.

I think everybody here understands that Bruker is not a research 
institution and does not
have to ressources and knowledge to do the modelling on a level you do.


I'm not so sure about that, there is a lot of expertise there.

In my talks I
advertised that our PDC allows a very convenient data analysis especially 
if the Bruker
release pulse programs (written by Wolfgang Bermel) are used.


Ah, the pulse programs Paul Gooley and I advised Wolfgang on.  These
are the high quality pulse sequences with temperature compensation in
both the form of temperature compensation blocks at the start and
single-scan interleaving.  These should lead to a huge change in
quality of data produced by users!

With our old T1/T2 module
in TopSpin we had a big problem. It was so bad that many people just used 
nmrPipe, Sparky,
ccpn or anything else even with having the need to a lot of manual work. I 
found it convenient
to furthermore offer some of the diffusion tensor and modelling stuff to be 
a bit more complete.


The PDC should be a huge help, and will definitely improve the quality
and consistency of published data.

I'm absolute happy if I can say however that there is relax available which 
is much more advanced
and can read our output.


The plan would be to automatically read everything so no manual
intervention is required by the user.  Even if the user wishes not to
use relax for model-free analysis, they can use the program to create
the input files for Art Palmer's Modelfree and Dasha.

What disappoints me a bit at the moment is the behaviour of the users 
(independent of the software
they use). Typically, they say, it is good to have all the modelling but 
what can we do with it ? The
overall dynamical features of the molecule are quite obvious already from 
looking at the NOE,
T1/T2 or reduced spectral densities. Many people just use relaxation data 
to check if there is
aggregation.
Your customer contacts should be much better than mine, what is you 
experience ?


I don't really know.  It probably depends on the aim of the user.  A
number of people are now not performing model-free analyses.  Maybe it
has something to do with the SRLS people heavily bashing model-free in
conferences at the moment.  Model-free is less trendy than it was a
decade ago.  This is one of the reasons there is a collaboration with
the BMRB.  Structures, chemical shifts, etc. are deposited so that
downstream analyses and studies can be performed.  But relaxation data
and model-free results are hardly ever deposited.  The result of this
is that papers only present pretty pictures.  So this is hindering
further analysis and preventing all this interesting data from being
useful.  Hopefully soon it will be compulsory to deposit data before
publication is possible.  That would make dynamics analyses far more
useful to the field.

Some users believe they can see the dynamics from the NOE, R1, and R2
plots, but there is a trap that is often fell into.  That is if there
is significant anisotropy in the rotational diffusion of the molecule.
 If a rigid alpha-helix points along the long axis, these users will
incorrectly conclude that these residues experience chemical exchange
relaxation (first discovered by Tjandra et al., 1995).  The same
happens along the short axis, though they will conclude that there are
fast internal motions (Schurr et al., 1994).  That is the problem with
the relaxation rates and spectral density mapping - there is no
separation of the internal from external motion, so unless you are an
absolute expert (via model-free or SRLS), you will probably make the
wrong conclusions.  The literature is full of this :S

Just to indicate the future resources for the PDC: Until ENC 2011 I have 
permission to use
50% of my time to add more features, e.g. allow user defined spectral 
density functions and
use multiple fields for modeling. But I already got a more general project, 
it will be called
Dynamics Center that must cover all kinds of dynamics, that includes 
diffusion, kinetics and
some solid state stuff like REDOR experiments. Applications will include 
smaller molecules.


For integration with relax, not much work will be required, hopefully
if not none at all.

Cheers,

Edward


Best regards,
Peter



On 11/16/2010 7:10 PM, Edward d'Auvergne wrote:

Dear Peter,

Thank you for posting this info to the relax mailing lists.  It is
much appreciated.  I hadn't thought too much about this, but this is
as you say: an error propagation through a ratio.  The same occurs
within the steady-state NOE error calculation.  As y=1/B and errA=0,
we could simply take the PDC file data and convert the error as:

sigma_R1 = sigma_T1 / T1^2.

This would be a 100% exact error calculation.  Therefore within relax,
we will only need to read the final relaxation data from the PDC files
and nothing about the peak intensities.  Reading additional
information from the PDC files could be added later, if someone needs
that.  One thing that would be very useful would be to have higher
precision values and errors in the PDC files.  5 or more significant
figures verses the current 2 or 3 would be of great benefit for
downstream analyses.  For a plot this is not necessary but for high
precision and highly non-linear analysis such as model-free (and SRLS
and spectral density mapping), this introduces significant propagating
truncation errors.  It would be good to avoid this issue.

An additional question is about the error calculation within the
Protein Dynamics Centre.  For model-free analysis, the errors are just
as important or maybe even more important than the data itself.  So it
is very important to know that the errors input into relax are of high
quality.  Ideally the R1 and R2 relaxation rate errors input into
relax would be from the gold standard of error propagation - Monte
Carlo simulations.  Is this what the PDC uses, or is the less accurate
jackknife technique used, or the even lowest accuracy covariance
matrix estimate?  And how are replicated spectra used in the PDC?  For
example, if only a few time points are duplicated, if all time points
are duplicated, if all time points are triplicated (I've seen this
done before), or if no time points are duplicated.  How does the PDC
handle each situation and how are the errors calculated?  relax
handles these all differently, and this is fully documented at
http://www.nmr-relax.com/api/1.3/prompt.spectrum.Spectrum-class.html#error_analysis.
 Also, does the PDC use peak heights or peak volumes to measure signal
intensities?

Sorry for all the questions, but I have one more.  All of the
fundamental NMR theories work in rates (model-free, SRLS, relaxation
dispersion, spectral density mapping, Abragam's relaxation equations
and their derivation, etc.), and most of the NMR dynamics software
accepts rates and their errors and not times.  The BMRB database now
will also accept rates in their new version 3.1 NMR-STAR definition
within the Auto_relaxation saveframe.  Also most people in the
dynamics field publish R1 and R2 plots, while T1 and T2 plots are much
rarer (unless you go back to the 80's).  If all Bruker users start to
publish Tx plots while most of the rest publish Rx plots, comparisons
between different molecular systems will be complicated.  So is there
a specific reason the PDC outputs in relaxation times rather than in
rates?

Cheers,

Edward



On 16 November 2010 06:52, Neidig Klaus-Peter
<Klaus-Peter.Neidig@xxxxxxxxxxxxxxxxx> wrote:

Dear all, Dear Michael & Edward,

I'm currently on the way to England, thus only a short note:

The error or an inverse is a special case of the error of a ratio. A search 
for "error propagation" in the internet yields
hundreds of hits. There are also some discussions about correlation 
bewtween involved quantities.

If y=A/B with given errors of A and B then the absolute error of y is y * 
sqrt [(errA/errB)^2 + (errB/B)^2]

If A=1 you get error of y is y*errB/B, since the error of a constant is 0.

I compared the results with the errors I got from Marquardt if I fit a* 
exp(-Rt) instead of a* exp(-t/T) by eye up to
a number of digits.

I hope, I did it right.

Best regards,
Peter
_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

.


--
Bruker BioSpin
________________________________

Dr. Klaus-Peter Neidig
Head of Analysis Group
NMR Software Development

Bruker BioSpin GmbH
Silberstreifen 4
76287 Rheinstetten
Germany  Phone: +49 721 5161-6447
 Fax:     +49 721 5161-6480

  peter.neidig@xxxxxxxxxxxxxxxxx
  www.bruker-biospin.com
________________________________
Bruker BioSpin GmbH: Sitz der Gesellschaft/Registered Office: Rheinstetten, 
HRB 102368 Amtsgericht Mannheim
Geschäftsführer/Managing Directors: Joerg Laukien, Dr. Bernd Gewiese, Dr. 
Dieter Schmalbein, Dr. Gerhard Roth

Diese E-Mail und alle Anlagen können Betriebs- oder Geschäftsgeheimnisse, 
oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese 
E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, 
eine Vervielfältigung oder Weitergabe der E-Mail und aller Anlagen 
ausdrücklich untersagt. Bitte benachrichtigen Sie den Absender und 
löschen/vernichten Sie die empfangene E-Mail und alle Anlagen.
Vielen Dank.

This message and any attachments may contain trade secrets or privileged, 
undisclosed or otherwise confidential information. If you have received 
this e-mail in error, you are hereby notified that any review, copying or 
distribution of it and its attachments is strictly prohibited. Please 
inform the sender immediately and delete/destroy the original message and 
any copies.
Thank you.

Re:

Header

Content

Related Messages