mailRe:


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Dr. Klaus-Peter Neidig on November 29, 2010 - 13:35:
Hi Edward,

after reading the mails again and checking the code again I think it would be good to do another cross-check,
perhaps best with an artificial example:

suppose we have mixing times   10, 10, 10 (3 replicates)
                               20
                               30, 30 (2 replicates)
                               40
                               50

and spectra with 3 peaks.

A:
I would take peak1, then check for the largest difference in intensity in the first 3 replicates and in the 2 replicates.
The largest found difference is then assigned as a systematic error to peak1 at all mixing times.

The procedure is repeated for peak2 and peak3.  Most likely peak1, peak2 and peak3 then have different systematic
errors. That would be like a worst case scenario per peak and is what currently was active in the PDC.

B:
The absolute worst scenario would be  to check for the largest ever ocuring difference and assign that to all
3 peaks at all mixing times. This we used in earlier versions of the PDC.


C:
Your variance averaging would work as follows:
loop over the first 3 replicates, calculate variance for each peak
loop over the 2 replicates, calculate variance for each peak
finally take the average over 6 variances and assign this as a systematic error to all peaks at all mixing times
Is this correct ?

I think A: and C: should be offered. After implementing this I see my data that the C: produces much larger
errors than A and the errors of the fitted parameters get much larger, sometimes larger than the values itself.
This seems not acceptable.

In the mean I also spoke to Wolfgang Bermel. My statement that peaks with longer T2 seem to very more
in replicates is not obvious to him. When looking to the data it turned out that I had looked at peaks that do
not only have longer T2 but also have larger resonance offsets. Wolfgang suspects that their might be an
offset effect.

The statement that spectra taken at longer mixing times show a lower base plane noise seems to be not true
in general. The trend I have seen in one of the spectra is not fully convincing so we checked a couple of other
spectra all taken with the latest release pulse programs. With those data we would not really confirm the trend.
We concluded to have an eye on that from now on.

Best regards,
Peter




On 11/24/2010 3:11 PM, Edward d'Auvergne wrote:
On 24 November 2010 10:31, Dr. Klaus-Peter Neidig
<peter.neidig@xxxxxxxxxxxxxxxxx> wrote:
Dear Edward,


I talked to some collegues at Bruker to discuss your various items.
In general there is agreement that being as much consistent with
relax as possible is a good idea. However other software at Bruker
does not have to be affected (some of the code is used in multiple
areas) and changes that cause too much effort and are already documented
and in use should be avoided. Enhancing the export functionality
to provide more infos to relax is regarded a good way to go.
Hi,

Yes, it would be a bad idea to make big changes.  Interfacing with
relax will be quite straightforward and probably no changes will be
required from Bruker's side.


In detail:

Do the users define the signal-free region?  For example there is
usually more noise in the random coil part of the spectrum due to
unwanted junk in the sample, so the peaks in this region have larger
errors.  What about signals close to the water signal?  Is if possible
to have different errors for these peaks?
No, the PDC does currently not allow to assign different errors to
different peaks. We take several regions in the spectrum automatically,
e.g. close to the 4 corners and finally regard the one with the lowest noise.
We have to rely on some assumptions like proper base line correction,
reasonably clean sample, careful post processing, if any.
Ah, ok.  We will have to make sure users manually exclude spins with
signals near the water region.  Again this is an easy thing to solve.


I don't understand the origin of this scaling.  Since Art Palmer's
seminal 1991 publication (http://dx.doi.org/10.1021/ja00012a001), the
RMSD of the pure base-plane noise has been used for the standard
deviation.  The key text is on page 4375:
Thanks for the various references. The factor we used so far is empirical
but no problem, I will remove it. Other software pieces at Bruker
that use such factors are not affected. I did run some tests already,
the fitted parameters do not really change but of course the errors of the
fitted parameters do. This needs to be documented to the user.
Thank you.  In a model-free analysis the accuracy of the errors is
just as important, or maybe even more important, than the data itself
so this will make a big difference.


From my work a long time ago, I noticed that the spectral errors
decreased with an increasing relaxation period.  This can be taken
into account in relax if all spectra are duplicated/triplicated/etc.
But if not, then the errors for all spectra are averaged (using
variance averaging, not standard deviation averaging).  For a single
duplicated/triplicated spectrum, the error is taken as the average
variance of each peak.  So when some, but not all, spectra are
replicated, there will be one error for all spin system at all
relaxation periods.  This sounds similar to what the PDC does, but is
it exactly the same?
I think the Bruker internal opinion was different in a sense that one should
either have enough replicates (which never happens) and do sdev averaging or
just provide an estimate of the systematic error. The change in intensity obviously
depends on the peaks. We have seen peaks with longer T2 varying much
more than those with shorter T2.  We concluded to to assume a worst case error
scenario and check for the largest difference of all peak intensities at
replicated mixing times. This error was then applied (added) to all peaks at
all mixing times. I should also mention that in the PDC the user has the option
to replace peak intensities/integrals of replicated mixing times by their mean
and each mixing time occurs only once in the fit. This yields slightly different
results. The request for implementing this came from our application people
who obviously talked to their customers.
Having enough replicates would never be possible for properly
determining errors, so it must be crudely estimated.  The maximum
error would mean that the errors would probably never be
underestimated.  But for downstream analysis, underestimating some
errors (and overestimating others) would be better as no bias would be
introduced into the errors.  Using the maximum error applied to all
peaks will actually result in molecules appearing more rigid than they
are!  The result will be the hiding of many motional processes.  So
the second option would be better.  Even better would be to use an
average error rather than the maximum by averaging variances (sdev
values cannot be averaged as they cannot be summed, but variances
sum).  Is the peak intensity error located in the PDC files, I cannot
see it in our current testing files
https://gna.org/file/testT2.txt?file_id=10641?


Since changing the error estimation of replicates could influence the results
significantly, we do not want to blindly do it. The proposal would be to offer
several options (variance averaging, sdev averaging, worst case estimate)
The sdev averaging here should be avoided as it statistically should
not be done.  For example from Wikipedia:

"The standard deviation of the sum of two random variables can be
related to their individual standard deviations and the covariance
between them:

    stdev(X + Y) = sqrt{var(X) + var(Y) + 2 * cov(X,Y)}.

where var = stdev^2 and cov stand for variance and covariance, respectively."

The average would then be sqrt(2)*stdev(X+Y), and the covariance is
zero.  Another example are the messages in the thread
http://mathforum.org/kb/message.jspa?messageID=1602446&tstart=0.

Having a number of options (variance averaging, worst case estimate,
etc.) would be great to have.  Could the user's choice be recorded in
the PDC file?  I would like relax to give the user a detailed warning
about hiding dynamics if they choose the worst case estimate.  The
worst case estimate would nevertheless be quite useful so I would not
suggest removing it.


but advise to use variance averaging in the documentation.
Your remark about decreasing spectral errors with increasing mixing times
is correct, I see this in all the data I have. The peak intensities/integrals
decrease however much more, the relative error of each thus increases.
This is a bit off topic, but would you know if this has something to
do with the total power of the spectrum?  Maybe Wolfgang Bermel's new
pulse sequences with both the temperature compensation block and
single-scan interleaving will eliminate this strange effect.


Does peak intensity mean peak height?  In relax, the term peak
intensity is inclusive of both volume, height, or any other measure of
the peak.  For relaxation data with proper temperature calibration and
proper temperature compensation, this is now often considered the best
method.  Peter Wright published a while back that volumes are more
accurate, but with the temperature consistency improvements the
problems Peter encountered are all solved and hence heights are better
as overlap problems and other spectral issues are avoided.  Is there a
way of automatically determining the method used in the PDC file,
because in the end the user will have to specify this for BMRB data
deposition?  This is not so important though, the user can manually
specify this if needed.
yes, intensity === height. The integration method used is part of the pdf report,
I just see that it is however not included in the export (text, or xls). I will change
that of course. The problem with peak heights is if peaks overlap, say one peak is
hidden in the shoulder of another. Such peaks usually have to be picked by hand
then peak deconvolution would be preferable. We keep on offering our options
but advise in the hand book that peak intensities would be preferred.
We could have relax read the integration method so that this is
automatically placed in the BMRB deposition file for the user.
Actually, maybe it would be an idea to have relax store the entire PDC
file and have this deposited within one of the BMRB method saveframes?

relax allows the user to do whatever they wish (the GUI interface does
not though), but if there is significant overlap, that data should not
be used.  This is all up to the user though.


The covariance analysis is known to be lowest quality of all the error
estimates.  For an exponential fit, I have not checked how
non-quadratic the minimum is, so I don't know the performance of the
method.  Do the errors from that compare to the area integral together
with base-plane RMSD?

        
I can't remember exactly, but I think Art Palmer published something
different to this and that is why his curvefit program uses MC as well
as Jackknife simulation.  Anyway, if the minimum has a perfectly
quadratic curvature, then the covariance matrix analysis and MC will
give identical results - the methods converge.  But as the minimum
deviates from quadratic, the performance of the covariance matrix
method drops off quickly.  MC simulations are the gold standard for
error propagation, when the direct error equation is too complex to
derive.  But again I don't know how non-quadratic the minimum is in
the exponential optimisation space, so I don't know how well the
covariance matrix performs.  The difference in techniques might be
seen for very weak exchanging peaks with large errors.  Or special
cases such as when the spectra are not properly processed and there
are truncation artifacts.
With my data the errors obtained from LM (taking errorY incto account) and MC
are pretty close. If a typical T1 is 0.473 the errors are 0.0026 and 0.0025.
The base plane rms decreases with mixing time the relative errors with the
given examples are 0.000386, ...0.00651, ...0.003124.
I haven't tested this personally, and just followed Art Palmer and
other's lead when I implemented this fitting and used the gold
standard Monte Carlo simulations.  It could be that the covariance
matrix technique performs just as well as MC simulations for all
exponential curve-fitting - the equations are not particularly
complex.  Maybe the user choice of covariance matrix estimate or Monte
Carlo simulations could be recorded in the PDC file so that this
information is also deposited in the BMRB?  I would have to talk to
Eldon Ulrich at the BMRB about adding some new tags for that.


What is this factor and why is it used?  The MC simulation errors
should be exact, assuming the input peak intensity errors are exact
estimates.  MC simulations propagate the errors through the non-linear
space, and hence these errors are perfectly exact.  Is there a
reference for this post-MC sim scaling?  Is it optional?  For relax,
it would be good to know both this scaling factor and that mentioned
above for the peak intensity error scaling, so that both can be
removed prior to analysis.  These will impact model-free analysis in a
very negative way.
In many applications people do not just want a single error but an error range
they can rely on with a certain confidence. Also they would like to
account for that different numbers of peaks may be used for different fits
(sometimes not all peaks are picked in all planes or the user removes points
from the fitted curve interactively). This is why implemented the same as
Matlab typically does. I would put an extra column to the output and list
the factor (typically ~ 2) for each fit. This way you can just recalculate
the original errors.
This would be perfect!  Cheers.


Certainly, model-free analysis will be influenced, whether in a negative way
or not I don't know. What I have seen so far (limited experience however) is
that if errors are too tight, the modelling calculates for ever and often
finds solutions with such large errors that practically unusable. I must
admit however that the PDC does currently not allow to use multiple fields.
In model-free analysis, underestimating errors has the strange effect
of introducing artificial motions!  Overestimating causes real motions
to disappear.  Error estimates are incredibly important.


Ah, historic reasons.  We will be able to handle both in relax.  But
if the PDC presents T1 and T2 plots, and these values are in the PDC
output file, the user is highly likely to then use this in
publications.  Some people prefer T1 and T2 plots, but these are an
exception in publications.  This will likely lead to a convention
change in the literature (for only Bruker users) which will make it
harder to compare systems.  It would be good to consider this effect.
The Bruker people seem to regard this as a minor issue for various reasons.
One is that plots for publications will probably be made with more advanced
tools like Excel and it is no effort to switch between T and R. Another
argument is that people who really want to compare details need to compare
numbers anyway. I will check if there is a simple way to allow the user
to switch between the presentations but currently we would keep the T
displays and outputs.
Ok, that makes sense.  relax uses xmgrace to display the relaxation
rates (never the rotational relaxation times though).  With BMRB data
deposition very close to being complete within relax, maybe more data
will be placed in the database and not just shown as a low resolution
T1 or T2 plot.  In many currently published papers, unfortunately
comparing low resolution graphs is the only way.  This is the purpose
of the relax-BMRB collaboration.


This all gives me some work to do and of course some testing. I hope to
do everything this week and then send new export files to you for inspection.
If accepted I will proceed to version PDC 1.1.3.
Cheers.  Michael Bieri has created a relax branch for implementing the
reading of the file.  Having newer files attached to
https://gna.org/task/?7180 will help in our implementation.

Thank you,

Edward
.


--
E-Mail Signatur Bruker BioSpin Rheinstetten V2.00
Bruker BioSpin

Dr. Klaus-Peter Neidig
Head of Analysis Group
NMR Software Development

Bruker BioSpin GmbH
Silberstreifen 4
76287 Rheinstetten

Germany
 Phone: +49 721 5161-6447
 Fax:     +49 721 5161-6480


Bruker BioSpin GmbH: Sitz der Gesellschaft/Registered Office: Rheinstetten, HRB 102368 Amtsgericht Mannheim
Geschäftsführer/Managing Directors: Joerg Laukien, Dr. Bernd Gewiese, Dr. Dieter Schmalbein, Dr. Gerhard Roth

Diese E-Mail und alle Anlagen können Betriebs- oder Geschäftsgeheimnisse, oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail und aller Anlagen ausdrücklich untersagt. Bitte benachrichtigen Sie den Absender und löschen/vernichten Sie die empfangene E-Mail und alle Anlagen.
Vielen Dank.


This message and any attachments may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it and its attachments is strictly prohibited. Please inform the sender immediately and delete/destroy the original message and any copies.
Thank you.


Related Messages


Powered by MHonArc, Updated Mon Nov 29 17:40:13 2010