Dear Edward, in short: the first 2 fit options (by fit, by weighted fit) both use the covariance technique. The difference is that the first option applies an additional internal scaling by: final chi/number of degrees of freedom This means that when doing modelling of (NOE, T1, T2) with (M1-M5, TM1-TM5) it cannot be used if these triples of number are only available at one field and the model contains 3 parameters. But in in that area the PDC user does not have a selection of fit methods anyway, we internally use a combination of simplex and marquardt. But this is another area and should be done by relax anyway. I pick up your statements about the confidence stuff. Indeed you are right that everything would be much simpler if the PDC did just output Rx instead of Tx and corresponding error. Whether keep the confidence stuff at all is then a question, in the mean I think I follow your opinion and would rather give it up. But this I have to discuss internally here. It might be that I'm not allowed to just give it up then I would have to increase the output with more columns or a further section which you could read instead of the current section. I also have to say that my developments on the PDC must end by April (ENC) and this includes updating the manul, release procedure and some internal developments. So there is not too much time left. The PDC release number will then go up to 1.2. After April I can still maintenance work but no major changes, at least until someone decides something else. Best regards, Peter Dr. Klaus-Peter Neidig Software Development / Head of Analysis Group Bruker BioSpin GmbH Silberstreifen 76287 Rheinstetten, Germany Phone: +49 721 5161-6447 Fax: +49 721 5161-6480 Peter.Neidig@xxxxxxxxxxxxxxxxx www.bruker.com Bruker BioSpin GmbH: Sitz der Gesellschaft/Registered Office: Rheinstetten, HRB 102368 Amtsgericht Mannheim Geschäftsführer/Managing Directors: Jörg Laukien, Dr. Bernd Gewiese, Dr. Gerhard Roth Diese E-Mail und alle Anlagen können Betriebs- oder Geschäftsgeheimnisse, oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail und aller Anlagen ausdrücklich untersagt. Bitte benachrichtigen Sie den Absender und löschen/vernichten Sie die empfangene E-Mail und alle Anlagen. Vielen Dank. This message and any attachments may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it and its attachments is strictly prohibited. Please inform the sender immediately and delete/destroy the original message and any copies. Thank you. ________________________________________ From: edward.dauvergne@xxxxxxxxx [edward.dauvergne@xxxxxxxxx] On Behalf Of Edward d'Auvergne [edward@xxxxxxxxxxxxx] Sent: Wednesday, February 23, 2011 3:46 PM To: Neidig, Klaus-Peter Cc: Neidig, Klaus-Peter; relax-devel@xxxxxxx; Michael Bieri Subject: Re: Addition of PDC support to relax 1.3.10 and an inconsistency in the error calculation. Dear Peter, Please see below: On 23 February 2011 12:33, Dr. Klaus-Peter Neidig <peter.neidig@xxxxxxxxxxxxxxxxx> wrote:
Dear Edward, I will provide the other output file examples as soon as the format is accepted by you. We still have a few points open. The relation between dialog window and ouput: dialog explanation appears in output as error estimation by fit all errors are unknown but the same for all y "from fit using arbitray y uncertainties" error estimation by weighted fit errors as listed in the output are used "from fit using calculated y uncertainties" error estimation by Monte Carlo N fits with synthetically modified y values "from Monte Carlo simulations"
Ah, ok, that explanation helps. Maybe the explanation could be expanded a little as both "error estimation by weighted fit" and "error estimation by Monte Carlo" use the same errors from the "used intergral error" section. The first appears to be the covariance matrix estimate. Is this correct? What is the technique used in "error estimation by fit"? Is this where the target function is actually the SSE (sum of squared errors) rather than the chi2, so that the errors are coming from the residuals of the fit? I think the explanation should identify the exact source of the errors, the target function, and the precise technique used (i.e. covariance matrix, Monte Carlo, etc.).
errorScale: Matlab and other software packages provide the fit results within confidence bounds which depend on a confidence level and number of degrees of freedom. The formula is c = f +/- t* fErr. f would be the fitted parameter value, fErr is the fit parameter error as coming out of Marquardt or MC, t is calculated.
This is too complex :S Most users will likely be confused by this - they will not know that the standard deviation (sd) on the relaxation rates is equal to t*fErr. They will be reporting fErr rather than the standard deviation as they should. The error bars in R1 and R2 figures must be standard deviations (http://en.wikipedia.org/wiki/Error_bar), or alternatively the standard error. The reported relaxation rates also need to be the mean (from the fit) and sd. All subsequent analyses require the sd and not fErr. There is a much simpler alternative. It is simple error propagation (http://en.wikipedia.org/wiki/Propagation_of_uncertainty). This concept is much simpler and there is no confidence levels or degrees of freedom used. The aim is to transfer the errors (sd) in the peak heights or volumes to the fitted R1 or R2 errors (sd). So using Art Palmer's notation from his 1991 paper, we are converting the peak height sd sigma_h to sigma_R1 or sigma_R2 via Monte Carlo simulations. We are going from the sd in the base data to the sd in the parameters. All current NMR software that I know of follows this simple error propagation principle. The standard error propagation technique, if a direct formula cannot be derived, is Monte Carlo simulations. Other lower quality techniques are sometimes used such as Jackknife simulations and sometimes the very rough covariance matrix estimate. The complexity used by the PDC is not necessary and most users will not understand the errors they are presented with. If they are not using relax to read the PDC file and convert the times to rates and the fErr to a sd, this will be very bad for model-free analysis, reduced spectral density mapping, SRLS, etc. (ok, the word catastrophic would be more apt). Excluding this confusion, most users will even have a problem converting from the sd of the times Tx to the sd of the rates Rx, as required for all subsequent analyses! relax, Modelfree4, Dasha, and Tensor2 (the most commonly used programs) all require rates and the sd of the rates - they do not accept times. Would it be possible to extend the PDC format in the results section to include not only all of the current columns but to also add the Tx standard deviation (t*fErr), the Rx value, and the Rx sd? If the user is presented with the fErr value and the standard deviation, they will instantly recognise which is the correct to use. Also presenting the rates and their errors: R1 = 1/T1 sigma_R1 = sigma_T1 / T1^2 Then the user will not make mistakes in the absolutely required conversion process. So for example: SECTION: results Peak name F1 [ppm] F2 [ppm] T1 [s] fErr errorScale T1 sd [s] R1 [rad.s-1] R1 sd [rad.s-1] Gln [2] 122.508 8.898 0.4560 0.0068944 2.2281 0.0030942 2.192982 0.0148805
The pdf output of the PDC shows f and (t*fErr) but you only want f and fErr. Therefore I have added an extra column for you, called it errScale and put t into it. The column contains always the same number since at the moment all peaks are fitted to the same model function and there is one user defined confidence level.
Most users won't know what to do with the fErr, or even how it is useful, so I would recommend presenting both fErr and the basic sd.
This whole thing has nothing to do with NC_Proc. NC_proc for the scaling of all datapoints of spectra. This is because Bruker originally did the FTT in integer arithmetic. To use as many bits as possible in the integers the data are scaled up by 2**NC_proc. The parameter NC_proc is stored and any software can do the reverse calculation to get different spectra comparable.
Ok, I was not sure how t and fErr were derived. I'm quite familiar with nc_proc and the ancient convention of the integer storage of the ser file. The scaling concept is much clearer with the formula c = f +/- t* fErr. I can see exactly what it is now. fErr is the half measure of 95% of the spread of the distribution, assuming a 95% confidence limit. And t is the parameter converting this back to the standard deviation.
Monte Carlo: We had a first initial internal meeting and there were some shaking heads here. I have furthermore searched the internet for other software packages and articles and what we offer as Monte Carlo is indeed what many others do as well offer as Monte Carlo. We agreed to go into some more details within the next few days, one collegue will do some Matlab simulations and we will have another meeting then. So for the moment I would not change anything in the PDC related to this topic.
One thing to be careful of is the direct calculation. The R1 and R2 use optimisation. But the NOE uses a direct calculation. Hence the back-calculated value for the NOE is the original value. So if one uses Monte Carlo simulations for the NOE, this collapses to the bootstrap technique. Also please don't be tricked by similarity of numbers. Comparing the numbers from Bootstrapping verses those of Monte Carlo simulations is of no use, as one doesn't know the real errors. This is not how one should judge Bootstrapping verses Monte Carlo simulations. There is one case where Bootstrapping can be used. This is when the original errors in peak heights or volumes are not known. Then it can be used. However it is still very bad and techniques such as Jackknife simulations have been developed to improve this situation and remove the Bootstrap bias. This is a complex topic of information theory and numerical analysis books will usually just state that Monte Carlo simulations is the gold standard and that Bootstrapping is not the way to go. A relatively simple explanation is given in short PDF file at http://www.terry.uga.edu/~satkinso/8120/8120_montebs.pdf. Here they mention how the Jackknife technique is for removing the bias of the Bootstrapping technique. The also mention that bootstrapping is for when there are no known original errors and that you use the residuals of the fit, though this is one of the other Bootstrapping techniques (this PDF talks about 3 of the many types).
People here accept that Marquardt's covariance method might cause problems depending on function and actual data but for the relaxation functions it worked well and gave very similar results to what you obtained. So it can't be that bad. It might be a more serious problem for other functions of course.
I would consider it ok. Note that the covariance matrix error estimate and Levenberg-Marquardt are completely distinct concepts derived 100% independently and coming from different areas of numerical analysis. The covariance estimate is not ideal, but the exponential functions are not very complex and there is not much convolution between the fitted parameters. This can be seen in the matrix, the off-diagonal elements of the covariance matrix should be much lower than the diagonal, i.e. there should be not much covariance. If this is the case, then the covariance estimate should be ok.
So, tell me if you want to other output files at current status or are there small things that should be changed first.
It might be best to attach the files to https://gna.org/task/?7180 after things are sorted out a little. Cheers, Edward