Dear Peter, Please see below: On 21 February 2011 09:51, Dr. Klaus-Peter Neidig <peter.neidig@xxxxxxxxxxxxxxxxx> wrote:
Dear Edward, I'm running a current of output files today and send them. The error of a ratio of numbers y = a/b with given aErr, bErr is yErr = sqrt [ (aErr/a)^2 + (bErr/b)^2], which for a=1 and aErr=0 is yErr = bErr/b Does this correspond to yout notation ?
This is not quite right, for example see the NOE error formula on page 5988 of the Farrow et al, Biochem. (1994), 33, 5984-6003 reference. The formula should be: yErr/y = sqrt [ (aErr/a)^2 + (bErr/b)^2]. For a=1 and aErr=0, the formula will then be: yErr/y = bErr/b. In the case of the R1: sigma_R1 / R1 = sigma_T1 / T1, or: sigma_R1 = sigma_T1 / T1^2, sigma_R1 = sigma_T1 * R1^2. There is a chance that this formula from the Kay paper is incorrect. I've been trying to find another reference, one that I found a long time ago, but I can't find it any more. Do you have a reference for the yErr = sqrt [ (aErr/a)^2 + (bErr/b)^2] formula? This does not explain the differences though, they are even more pronounced using yErr = sqrt [ (aErr/a)^2 + (bErr/b)^2].
The fit error determination in the PDC depends on the method the user has chosen. Under the tag Fit parameter Error estimation method: the options are - from fit using arbitray y uncertainties - from fit using calculated y uncertainties - from Monte Carlo simulations
Which of these does the "SECTION: used integral errors" belong to? This section appears to be errors associated with each peak intensity from "SECTION: used integrals". I simply took these to be individual peak errors and then performed standard Monte Carlo simulations on these. Each peak appears to have a different error, but does "Fit parameter Error estimation method: from fit using arbitray y uncertainties" means one error for all peaks?
The first 2 are offered by the our standard Marquardt implementation.
The Levenberg-Marquardt optimisation and Monte Carlo simulations are two unrelated techniques, one is for optimisation, the other for error propagation. A very low quality error estimation method comes from the covariance matrix. So one can use Levenberg-Marquardt (or any other optimisation algorithm) and then determine the errors using the covariance matrix defined at the solution. I don't understand the differences in the error estimation methods in the PDC :S
The first option assumes the same error for all y values. This then cancales out of the formula, therefore we call it arbitrary.
But this could be used for either the Monte Carlo simulations or the covariance matrix error estimation methods. In relax, this is used in combination with MC simulations.
The second option uses the y errors as determined before. As we discussed earlier the base plane RMS is one source for these errors the other is from optional repetition experiments. If repetition experiments are available the systematic error can be calculated as: - worst case per peak scenario - average variance calculation
As I showed using theoretical examples in https://mail.gna.org/public/relax-devel/2010-11/msg00027.html (data, python and java scripts are attached to https://gna.org/task/?7180), the 2 worst case per peak methods on average significantly overestimate the errors, and have huge variances meaning that the real errors compared to the estimates are multiplied by a normal distribution of 1.594+/-0.852. So 95% of the estimates from duplicated spectra (assuming a normal distribution) will be scaled by a factor between -0.11 and 3.298! This type of error estimate would be fatal for subsequent model-free analysis. Therefore in relax we will need to catch "Systematic error estimation of data: worst case per peak scenario" and re-perform the relaxation curve-fitting and Monte Carlo simulations. Or maybe throw an in relax error and tell the user that they must go back to the PDC and use "average variance calculation" with "Monte Carlo simlautions". In the example file you sent (http://svn.gna.org/viewcvs/relax/1.3/test_suite/shared_data/pdc_files/testT1.txt?revision=12551&view=markup), it says "Fit parameter Error estimation method: from fit using arbitray y uncertainties". Above you said that this corresponds to 1 error for all peaks. But you described in https://mail.gna.org/public/relax-devel/2010-11/msg00021.html that the current "worst case per peak scenario" method in the PDC uses a different estimate for each peak. The file also says "Random error estimation of data: RMS per spectrum (or trace/plane)". So I am quite confused as to how to interpret the data. Where do I read the peak intensity errors from?
The test output files I have sent so far calculated the fit parameter errors via from fit using arbitray y uncertainties. This option was chosen accidentally. The error there is different from the other 2 methods. Perhaps it is a better idea to generate a new set of output files using the option "from fit using calculated y uncertainties". That usually compares well with MC simulations.
What is the "from fit using calculated y uncertainties"? Is this the covariance matrix technique for error estimation? For simple single exponentials, the covariance matrix technique and Monte Carlo simulations should not be too different, I hope. How does this combine with "worst case per peak scenario" and "average variance calculation"?
In the PDC I do 1000 MC simulations such that a vary the y values based on a gaussian random distribution, the mean is the original y, the width is the error of y.
You mean you take the back-calculated y value and randomise this 1000 times with a Gaussian distribution centred at the back-calculated y with an error set to the peak intensity error? If you take the original y, this is not Monte Carlo simulations but Bootstrapping - a technique which should never be used for error estimation. Well, you can use bootstrapping for a direct calculation of values, but never for optimised values. I am quite confused. The way I see the analysis, using the notations from the statistics and optimisation fields, there are the following categories: Peak intensity notation: - Height (the physical height of the Lorentizian peak at its mean, or maximum point) - Volume (sum of all points, i.e. the discrete integration of the Lorentizian) - Height (as above but via fitting peak shape to a function) - Volume (as above but via fitting peak shape to a function) Peak errors: - 1 average error for all peaks (from some spectra duplicated, triplicated, etc.) - 1 max error for all peaks (largest ever occurring difference) - 1 average error for all peaks in one spectrum (only if all spectra are duplicated) - 1 max error for each peak across all spectra (current "worst case per peak scenario"?). - Each peak has a different error (from the baseplane RMSD) Optimisation methods: - Levenberg-Marquardt - Simplex - Newton - etc. Error estimation methods: - Covariance matrix (bad) - Jackknife (also bad) - Bootstrapping (never to be used with optimisation) - Monte Carlo simulations (considered the gold standard) - etc. I am having trouble classifying the PDC file http://svn.gna.org/viewcvs/relax/1.3/test_suite/shared_data/pdc_files/testT1.txt?revision=12551&view=markup into these categories. This is very, very confusing :S Regards, Edward