Dear Edward, I talked to some collegues at Bruker to discuss your various items. In general there is agreement that being as much consistent with relax as possible is a good idea. However other software at Bruker does not have to be affected (some of the code is used in multiple areas) and changes that cause too much effort and are already documented and in use should be avoided. Enhancing the export functionality to provide more infos to relax is regarded a good way to go. In detail: > Do the users define the signal-free region? For example there is > usually more noise in the random coil part of the spectrum due to > unwanted junk in the sample, so the peaks in this region have larger > errors. What about signals close to the water signal? Is if possible > to have different errors for these peaks? No, the PDC does currently not allow to assign different errors to different peaks. We take several regions in the spectrum automatically, e.g. close to the 4 corners and finally regard the one with the lowest noise. We have to rely on some assumptions like proper base line correction, reasonably clean sample, careful post processing, if any. > I don't understand the origin of this scaling. Since Art Palmer's > seminal 1991 publication (http://dx.doi.org/10.1021/ja00012a001), the > RMSD of the pure base-plane noise has been used for the standard > deviation. The key text is on page 4375: Thanks for the various references. The factor we used so far is empirical but no problem, I will remove it. Other software pieces at Bruker that use such factors are not affected. I did run some tests already, the fitted parameters do not really change but of course the errors of the fitted parameters do. This needs to be documented to the user. > From my work a long time ago, I noticed that the spectral errors > decreased with an increasing relaxation period. This can be taken > into account in relax if all spectra are duplicated/triplicated/etc. > But if not, then the errors for all spectra are averaged (using > variance averaging, not standard deviation averaging). For a single > duplicated/triplicated spectrum, the error is taken as the average > variance of each peak. So when some, but not all, spectra are > replicated, there will be one error for all spin system at all > relaxation periods. This sounds similar to what the PDC does, but is > it exactly the same? I think the Bruker internal opinion was different in a sense that one should either have enough replicates (which never happens) and do sdev averaging or just provide an estimate of the systematic error. The change in intensity obviously depends on the peaks. We have seen peaks with longer T2 varying much more than those with shorter T2. We concluded to to assume a worst case error scenario and check for the largest difference of all peak intensities at replicated mixing times. This error was then applied (added) to all peaks at all mixing times. I should also mention that in the PDC the user has the option to replace peak intensities/integrals of replicated mixing times by their mean and each mixing time occurs only once in the fit. This yields slightly different results. The request for implementing this came from our application people who obviously talked to their customers. Since changing the error estimation of replicates could influence the results significantly, we do not want to blindly do it. The proposal would be to offer several options (variance averaging, sdev averaging, worst case estimate) but advise to use variance averaging in the documentation. Your remark about decreasing spectral errors with increasing mixing times is correct, I see this in all the data I have. The peak intensities/integrals decrease however much more, the relative error of each thus increases. > Does peak intensity mean peak height? In relax, the term peak > intensity is inclusive of both volume, height, or any other measure of > the peak. For relaxation data with proper temperature calibration and > proper temperature compensation, this is now often considered the best > method. Peter Wright published a while back that volumes are more > accurate, but with the temperature consistency improvements the > problems Peter encountered are all solved and hence heights are better > as overlap problems and other spectral issues are avoided. Is there a > way of automatically determining the method used in the PDC file, > because in the end the user will have to specify this for BMRB data > deposition? This is not so important though, the user can manually > specify this if needed. yes, intensity === height. The integration method used is part of the pdf report, I just see that it is however not included in the export (text, or xls). I will change that of course. The problem with peak heights is if peaks overlap, say one peak is hidden in the shoulder of another. Such peaks usually have to be picked by hand then peak deconvolution would be preferable. We keep on offering our options but advise in the hand book that peak intensities would be preferred. > The covariance analysis is known to be lowest quality of all the error > estimates. For an exponential fit, I have not checked how > non-quadratic the minimum is, so I don't know the performance of the > method. Do the errors from that compare to the area integral together > with base-plane RMSD? > I can't remember exactly, but I think Art Palmer published something > different to this and that is why his curvefit program uses MC as well > as Jackknife simulation. Anyway, if the minimum has a perfectly > quadratic curvature, then the covariance matrix analysis and MC will > give identical results - the methods converge. But as the minimum > deviates from quadratic, the performance of the covariance matrix > method drops off quickly. MC simulations are the gold standard for > error propagation, when the direct error equation is too complex to > derive. But again I don't know how non-quadratic the minimum is in > the exponential optimisation space, so I don't know how well the > covariance matrix performs. The difference in techniques might be > seen for very weak exchanging peaks with large errors. Or special > cases such as when the spectra are not properly processed and there > are truncation artifacts. With my data the errors obtained from LM (taking errorY incto account) and MC are pretty close. If a typical T1 is 0.473 the errors are 0.0026 and 0.0025. The base plane rms decreases with mixing time the relative errors with the given examples are 0.000386, ...0.00651, ...0.003124. > What is this factor and why is it used? The MC simulation errors > should be exact, assuming the input peak intensity errors are exact > estimates. MC simulations propagate the errors through the non-linear > space, and hence these errors are perfectly exact. Is there a > reference for this post-MC sim scaling? Is it optional? For relax, > it would be good to know both this scaling factor and that mentioned > above for the peak intensity error scaling, so that both can be > removed prior to analysis. These will impact model-free analysis in a > very negative way. In many applications people do not just want a single error but an error range they can rely on with a certain confidence. Also they would like to account for that different numbers of peaks may be used for different fits (sometimes not all peaks are picked in all planes or the user removes points from the fitted curve interactively). This is why implemented the same as Matlab typically does. I would put an extra column to the output and list the factor (typically ~ 2) for each fit. This way you can just recalculate the original errors. Certainly, model-free analysis will be influenced, whether in a negative way or not I don't know. What I have seen so far (limited experience however) is that if errors are too tight, the modelling calculates for ever and often finds solutions with such large errors that practically unusable. I must admit however that the PDC does currently not allow to use multiple fields. > Ah, historic reasons. We will be able to handle both in relax. But > if the PDC presents T1 and T2 plots, and these values are in the PDC > output file, the user is highly likely to then use this in > publications. Some people prefer T1 and T2 plots, but these are an > exception in publications. This will likely lead to a convention > change in the literature (for only Bruker users) which will make it > harder to compare systems. It would be good to consider this effect. The Bruker people seem to regard this as a minor issue for various reasons. One is that plots for publications will probably be made with more advanced tools like Excel and it is no effort to switch between T and R. Another argument is that people who really want to compare details need to compare numbers anyway. I will check if there is a simple way to allow the user to switch between the presentations but currently we would keep the T displays and outputs. This all gives me some work to do and of course some testing. I hope to do everything this week and then send new export files to you for inspection. If accepted I will proceed to version PDC 1.1.3. Best regards, Peter On 11/22/2010 3:55 PM, Edward d'Auvergne wrote: Dear Peter, Sorry for the bounce messages you have been receiving, your email address should now be on the accepted list. I missed the bounce warnings as these were appearing in my spam folder. Your posts should soon now appear in the archives https://mail.gna.org/public/relax-devel/. Cheers. I have more below. On 19 November 2010 10:40, Dr. Klaus-Peter Neidig <peter.neidig@xxxxxxxxxxxxxxxxx> wrote:Dear Edward, after coming back from England, let me provide some comments. Some of the literature use the words error and uncertainty interchangeably. The quantity sigma used in least squares fitting would be the square of error or uncertainty.In relax, we use sigma as the standard deviation (or error and uncertainty), and sigma squared is the variance. This comes from the normal distribution, the minus log of which creates the chi-squared statistic, hence is divided by the variance. Ah, I just read your other message (https://mail.gna.org/public/relax-devel/2010-11/msg00006.html).The pdc output refers to error, so error_R1 = error_T1/T1Then we should be able to read in the standard deviations directly from the PDC file. This will be extremely easy to implement.Where do the errors come from ? The source of the errors is the spectrum. Each data point has an error. Commonly this error is derived from signal-to-noise, usually obtained from signal free regions.Do the users define the signal-free region? For example there is usually more noise in the random coil part of the spectrum due to unwanted junk in the sample, so the peaks in this region have larger errors. What about signals close to the water signal? Is if possible to have different errors for these peaks?In most cases the standard deviation is calculated from such regions then multipled with a factor. The PDC uses multiple such regions, takes the one with the lowest sdev and multiples it by 2.0. There is no unqiue definition of this factor. When used as a peak picking threshold we often use 3.5 * sdev. The noise is calculated for each plane of the pseudo 3D separately.I don't understand the origin of this scaling. Since Art Palmer's seminal 1991 publication (http://dx.doi.org/10.1021/ja00012a001), the RMSD of the pure base-plane noise has been used for the standard deviation. The key text is on page 4375: "The uncertainties in the measured peak heights, sigma_h, were set equal to the root-mean-square baseline noise in the spectra. This assumption was validated by recording two duplicate spectra with use of the sequence of Figure 1b with T = 0.02 s. Assuming that the peak heights of the 24 C^alpha resonances are identically distributed, the standard deviation of the differences between the heights of corresponding peaks in the paired spectra is equal to 2^(1/2)*sigma_h. The value of determined in this manner was compared to the root-mean-square baseline noise." Tyler Reddy pointed this one out (https://mail.gna.org/public/relax-users/2008-10/msg00026.html), though this has been used in relax since about 2003. This was also used in the important Farrow et al. (1994) Biochemistry, 33: 5984-6003 article (http://dx.doi.org/10.1021/bi00185a040): "The root-mean-square (rms) value of the background noise regions was used to estimate the standard deviation of the measured intensities. The duplicate spectra that were collected were used to assess the validity of this estimate. The distribution of the difference in intensities of identical peaks in duplicate spectra should have a standard deviation sqrt(2) times greater than the standard deviation of the individual peaks. This analysis was performed for the SH2 data, and the two measures were found to agree (average discrepancy of 7%). On this basis, it was concluded that the rms value of the noise could be used to estimate the standard deviations of the measured intensities." Again this was mentioned by Tyler (https://mail.gna.org/public/relax-users/2008-10/msg00029.html). Has there been advancements to these ideas? Is the factor due to other error sources other than spectral noise?But this would not account for systematic errors. Therefore it is advised to also repeat experiments and check the variation of data points from spectrum to spectrum. Unfortunately it is up to the NMR user to do this or not and in most cases we are happy to see1-3 mixing times repeated at least once. That is not enough to calculate proper errors, therefore we just take the spread (max difference) as an estimate and apply it to all planes.>From my work a long time ago, I noticed that the spectral errors decreased with an increasing relaxation period. This can be taken into account in relax if all spectra are duplicated/triplicated/etc. But if not, then the errors for all spectra are averaged (using variance averaging, not standard deviation averaging). For a single duplicated/triplicated spectrum, the error is taken as the average variance of each peak. So when some, but not all, spectra are replicated, there will be one error for all spin system at all relaxation periods. This sounds similar to what the PDC does, but is it exactly the same? For those interested, the exact code can be seen in the __error_repl() function in http://svn.gna.org/viewcvs/relax/1.3/generic_fns/spectrum.py?revision=11618&view=markup&pathrev=11688.Next the user has the freedom to calculate the peak integrals. We offer: just peak intensity, peak shape integral, peak area integral and peak fit.Does peak intensity mean peak height? In relax, the term peak intensity is inclusive of both volume, height, or any other measure of the peak. For relaxation data with proper temperature calibration and proper temperature compensation, this is now often considered the best method. Peter Wright published a while back that volumes are more accurate, but with the temperature consistency improvements the problems Peter encountered are all solved and hence heights are better as overlap problems and other spectral issues are avoided. Is there a way of automatically determining the method used in the PDC file, because in the end the user will have to specify this for BMRB data deposition? This is not so important though, the user can manually specify this if needed.They all have their disadvantages in many cases the peak intensity is not even the worst. The error estimation also depend on the chosen method. In case of shape and area integral the integral error goes down with the number of integrated data points since at least random errors partially cancel out. In case of peak fit we take the error as coming out of Levenberg Marquardt, internally based on covariance analysis.The covariance analysis is known to be lowest quality of all the error estimates. For an exponential fit, I have not checked how non-quadratic the minimum is, so I don't know the performance of the method. Do the errors from that compare to the area integral together with base-plane RMSD?Since the number of data points in pseudo 3D spectra is small per peak the 2D peak fit usually results in large errors. Also, the peak fitting assumes line shapes as gaussian, lorentizian or mixtures of both but the actual peak shape often differs from that.As most people -should- use peak heights, this should not be too much of an issue.As it comes to relaxation parameter fitting we supply the peak integral errors to Levenberg-Marquardt and get fitted parameters and their errors back, again based on covariance analysis. Alternatively the user may run MC simulations, usually 1000. The input variation of the integrals comes from a gaussian random generator, the width of the gaussian distribution for each integral is taken identical to the estimated error of that integral. Literature says that the error then obtained from MC should be identical to the error obtained from LM.I can't remember exactly, but I think Art Palmer published something different to this and that is why his curvefit program uses MC as well as Jackknife simulation. Anyway, if the minimum has a perfectly quadratic curvature, then the covariance matrix analysis and MC will give identical results - the methods converge. But as the minimum deviates from quadratic, the performance of the covariance matrix method drops off quickly. MC simulations are the gold standard for error propagation, when the direct error equation is too complex to derive. But again I don't know how non-quadratic the minimum is in the exponential optimisation space, so I don't know how well the covariance matrix performs. The difference in techniques might be seen for very weak exchanging peaks with large errors. Or special cases such as when the spectra are not properly processed and there are truncation artifacts.I checked this by eye, they are at least similar. All errors, regardless from LM or MC are finally multiplied with a factor obtained from a Student T distribution at given confidence level and degrees of freedom. The number we get at the end agree with what we get from Matlab which has become an internal standard for us at Bruker.What is this factor and why is it used? The MC simulation errors should be exact, assuming the input peak intensity errors are exact estimates. MC simulations propagate the errors through the non-linear space, and hence these errors are perfectly exact. Is there a reference for this post-MC sim scaling? Is it optional? For relax, it would be good to know both this scaling factor and that mentioned above for the peak intensity error scaling, so that both can be removed prior to analysis. These will impact model-free analysis in a very negative way.As it comes to model free modelling the errors obtained so far are taken into account. But there is heavy criticism from some of the experts. They say for example that the errors tend to be too small, especially the NOE error which is only based on the ratio of 2 numbers, whereas T1 and T2 errors are based fitting of 10-20 input values. Therefore in the PDC we allow tht user override the determined errors by default errors e.g. 2% for NOE and 1% for T1 and T2. They also say that the modelling output should contain the back calculated T1, T2 and NOE and should get markers if T1 and T2 are well reproduced regardless what NOE is doing. I don't want to comment on this but I have implemented it in the PDC.I will not comment either - there is no point ;) Their model-free analyses will fail in interesting ways though :S But that is up to the user. It would be good to not default to such behaviour.During the curve fitting we refer to T1, T2 for no deeper reason. The older Bruker software tools did it and a module in TopSpin is for example called the T1/T2 module. Today I'm only a developer (in former times I could define projects by myself) and other people officially tell me what to do. That is the only reason for using T1, T2. As soon as it goes beyond the relaxation curve fitting everything else internally continues with R1, R2, most of the literature presents the formulas with R1, R2 and I didn't want to rewrite all these and introduce mistakes. Technically, it would be no problem to present everything in terms of R1, R2 but now several people already use the software an nobody complained. Perhaps with a future version I should just allow the user to switch between the one or other representation. I will discuss this with some people here.Ah, historic reasons. We will be able to handle both in relax. But if the PDC presents T1 and T2 plots, and these values are in the PDC output file, the user is highly likely to then use this in publications. Some people prefer T1 and T2 plots, but these are an exception in publications. This will likely lead to a convention change in the literature (for only Bruker users) which will make it harder to compare systems. It would be good to consider this effect. Cheers.I think everybody here understands that Bruker is not a research institution and does not have to ressources and knowledge to do the modelling on a level you do.I'm not so sure about that, there is a lot of expertise there.In my talks I advertised that our PDC allows a very convenient data analysis especially if the Bruker release pulse programs (written by Wolfgang Bermel) are used.Ah, the pulse programs Paul Gooley and I advised Wolfgang on. These are the high quality pulse sequences with temperature compensation in both the form of temperature compensation blocks at the start and single-scan interleaving. These should lead to a huge change in quality of data produced by users!With our old T1/T2 module in TopSpin we had a big problem. It was so bad that many people just used nmrPipe, Sparky, ccpn or anything else even with having the need to a lot of manual work. I found it convenient to furthermore offer some of the diffusion tensor and modelling stuff to be a bit more complete.The PDC should be a huge help, and will definitely improve the quality and consistency of published data.I'm absolute happy if I can say however that there is relax available which is much more advanced and can read our output.The plan would be to automatically read everything so no manual intervention is required by the user. Even if the user wishes not to use relax for model-free analysis, they can use the program to create the input files for Art Palmer's Modelfree and Dasha.What disappoints me a bit at the moment is the behaviour of the users (independent of the software they use). Typically, they say, it is good to have all the modelling but what can we do with it ? The overall dynamical features of the molecule are quite obvious already from looking at the NOE, T1/T2 or reduced spectral densities. Many people just use relaxation data to check if there is aggregation. Your customer contacts should be much better than mine, what is you experience ?I don't really know. It probably depends on the aim of the user. A number of people are now not performing model-free analyses. Maybe it has something to do with the SRLS people heavily bashing model-free in conferences at the moment. Model-free is less trendy than it was a decade ago. This is one of the reasons there is a collaboration with the BMRB. Structures, chemical shifts, etc. are deposited so that downstream analyses and studies can be performed. But relaxation data and model-free results are hardly ever deposited. The result of this is that papers only present pretty pictures. So this is hindering further analysis and preventing all this interesting data from being useful. Hopefully soon it will be compulsory to deposit data before publication is possible. That would make dynamics analyses far more useful to the field. Some users believe they can see the dynamics from the NOE, R1, and R2 plots, but there is a trap that is often fell into. That is if there is significant anisotropy in the rotational diffusion of the molecule. If a rigid alpha-helix points along the long axis, these users will incorrectly conclude that these residues experience chemical exchange relaxation (first discovered by Tjandra et al., 1995). The same happens along the short axis, though they will conclude that there are fast internal motions (Schurr et al., 1994). That is the problem with the relaxation rates and spectral density mapping - there is no separation of the internal from external motion, so unless you are an absolute expert (via model-free or SRLS), you will probably make the wrong conclusions. The literature is full of this :SJust to indicate the future resources for the PDC: Until ENC 2011 I have permission to use 50% of my time to add more features, e.g. allow user defined spectral density functions and use multiple fields for modeling. But I already got a more general project, it will be called Dynamics Center that must cover all kinds of dynamics, that includes diffusion, kinetics and some solid state stuff like REDOR experiments. Applications will include smaller molecules.For integration with relax, not much work will be required, hopefully if not none at all. Cheers, EdwardBest regards, Peter On 11/16/2010 7:10 PM, Edward d'Auvergne wrote: Dear Peter, Thank you for posting this info to the relax mailing lists. It is much appreciated. I hadn't thought too much about this, but this is as you say: an error propagation through a ratio. The same occurs within the steady-state NOE error calculation. As y=1/B and errA=0, we could simply take the PDC file data and convert the error as: sigma_R1 = sigma_T1 / T1^2. This would be a 100% exact error calculation. Therefore within relax, we will only need to read the final relaxation data from the PDC files and nothing about the peak intensities. Reading additional information from the PDC files could be added later, if someone needs that. One thing that would be very useful would be to have higher precision values and errors in the PDC files. 5 or more significant figures verses the current 2 or 3 would be of great benefit for downstream analyses. For a plot this is not necessary but for high precision and highly non-linear analysis such as model-free (and SRLS and spectral density mapping), this introduces significant propagating truncation errors. It would be good to avoid this issue. An additional question is about the error calculation within the Protein Dynamics Centre. For model-free analysis, the errors are just as important or maybe even more important than the data itself. So it is very important to know that the errors input into relax are of high quality. Ideally the R1 and R2 relaxation rate errors input into relax would be from the gold standard of error propagation - Monte Carlo simulations. Is this what the PDC uses, or is the less accurate jackknife technique used, or the even lowest accuracy covariance matrix estimate? And how are replicated spectra used in the PDC? For example, if only a few time points are duplicated, if all time points are duplicated, if all time points are triplicated (I've seen this done before), or if no time points are duplicated. How does the PDC handle each situation and how are the errors calculated? relax handles these all differently, and this is fully documented at http://www.nmr-relax.com/api/1.3/prompt.spectrum.Spectrum-class.html#error_analysis. Also, does the PDC use peak heights or peak volumes to measure signal intensities? Sorry for all the questions, but I have one more. All of the fundamental NMR theories work in rates (model-free, SRLS, relaxation dispersion, spectral density mapping, Abragam's relaxation equations and their derivation, etc.), and most of the NMR dynamics software accepts rates and their errors and not times. The BMRB database now will also accept rates in their new version 3.1 NMR-STAR definition within the Auto_relaxation saveframe. Also most people in the dynamics field publish R1 and R2 plots, while T1 and T2 plots are much rarer (unless you go back to the 80's). If all Bruker users start to publish Tx plots while most of the rest publish Rx plots, comparisons between different molecular systems will be complicated. So is there a specific reason the PDC outputs in relaxation times rather than in rates? Cheers, Edward On 16 November 2010 06:52, Neidig Klaus-Peter <Klaus-Peter.Neidig@xxxxxxxxxxxxxxxxx> wrote: Dear all, Dear Michael & Edward, I'm currently on the way to England, thus only a short note: The error or an inverse is a special case of the error of a ratio. A search for "error propagation" in the internet yields hundreds of hits. There are also some discussions about correlation bewtween involved quantities. If y=A/B with given errors of A and B then the absolute error of y is y * sqrt [(errA/errB)^2 + (errB/B)^2] If A=1 you get error of y is y*errB/B, since the error of a constant is 0. I compared the results with the errors I got from Marquardt if I fit a* exp(-Rt) instead of a* exp(-t/T) by eye up to a number of digits. I hope, I did it right. Best regards, Peter _______________________________________________ relax (http://nmr-relax.com) This is the relax-devel mailing list relax-devel@xxxxxxx To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel . -- Bruker BioSpin ________________________________ Dr. Klaus-Peter Neidig Head of Analysis Group NMR Software Development Bruker BioSpin GmbH Silberstreifen 4 76287 Rheinstetten Germany Phone: +49 721 5161-6447 Fax: +49 721 5161-6480 peter.neidig@xxxxxxxxxxxxxxxxx www.bruker-biospin.com ________________________________ Bruker BioSpin GmbH: Sitz der Gesellschaft/Registered Office: Rheinstetten, HRB 102368 Amtsgericht Mannheim Geschäftsführer/Managing Directors: Joerg Laukien, Dr. Bernd Gewiese, Dr. Dieter Schmalbein, Dr. Gerhard Roth Diese E-Mail und alle Anlagen können Betriebs- oder Geschäftsgeheimnisse, oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail und aller Anlagen ausdrücklich untersagt. Bitte benachrichtigen Sie den Absender und löschen/vernichten Sie die empfangene E-Mail und alle Anlagen. Vielen Dank. This message and any attachments may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it and its attachments is strictly prohibited. Please inform the sender immediately and delete/destroy the original message and any copies. Thank you.. --
Bruker
BioSpin
|