Hi Edward. I have used this regression book. http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf I like the language (fun and humoristic), and goes over a great detail. For example, there is quite a list of weighting methods. I find the comments on page 28 a little disturbing. (See also page 86+87) The Monte-Carlo simulation is described at page 104. 1) Create an ideal data set. -> Check. This is done with relax monte_carlo.create_data(method='back_calc') 2) Add random scatter. relax now add random scatter, per individual datapoint. The random scatter is drawn from the measured error of the datapoint. But the book suggest adding errors described by variance of the residuals. This is described at page 33. "If you chose to weight the values and minimize the relative distance squared (or some other weighting function), goodness-of-fit is quantified with the weighted sum- of-squares." Sy.x = sqrt(SS/dof) = sqrt(chi2 / dof) The question is of course, if SS should be sum of squared errors for the weighted points, or the non-weighted R2eff points. Best Troels 2015-01-19 10:31 GMT+01:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:
Hi Troels, Do you have a reference for the technique? You mentioned a 'fitting guide' in one of your commit messages, but without a reference to it. I would like to study the technique to understand the implementation. Does it have another name? I would guess so as it breaks the statistical principles that the Monte Carlo simulation technique uses. That is why the monte_carlo.create_data user function says on the first line that the technique is called bootstrapping rather than Monte Carlo simulations if the 'method' argument is changed. I would also like to check if it is implemented properly. For example accessing the chi2, converting it to the reduced chi2 and using this as the SSE may not be correct, as the former is normalised by the errors and the later is not. You may need to recalculate the SSE as there is no way to convert from the full/reduced chi2 value to an SSE value. I would also like to see if this implementation is a new methodology that sits beside Monte Carlo simulations and Bootstrapping simulations, or if it can be used for both of these. Cheers, Edward On 16 January 2015 at 19:13, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:Hi, Sorry, I meant sum_i(residual_i / error_i)/N > 1.0. As for calculating the bias value, it looks like Wikipedia has a reasonable description (https://en.wikipedia.org/wiki/Bias_of_an_estimator). Regards, Edward On 16 January 2015 at 19:07, Edward d'Auvergne <edward@xxxxxxxxxxxxx>wrote:Hi, If you plot the R2eff errors from the Monte Carlo simulations of that model, are they Gaussian? Well, that's assuming you have full dispersion curves. Theoretically from the white noise in the NMR spectrum they should be. Anyway, even if it not claimed that Monte Carlo simulations have failed, and you have small errors from MC simulations and large errors from other error analysis technique, and then pick the large errors, that implies that the Monte Carlo simulations have failed. As for using residuals, this is a fall-back technique when experimental errors have not, or cannot, be measured. This uses another convergence assumption - if you have infinite data and the model has a bias value of exactly 0.0, then the residuals converge to the experimental errors. For clustering, you might violate this bias == 0.0 condition (that is almost guaranteed). You should also plot your residuals. If the average residual value is not 0.0, then you have model bias. Of if you see a trend in the residual plot. For using the residuals, how do these values compare to the error values? If the residuals are bigger than the experimental error, then this also indicates that abs(bias) > 0.0. A scatter plot of R2eff residuals vs. errors might be quite useful. This should be a linear plot with a gradient of 1, anything else indicates bias. There might even be a way of calculating the bias value from the residuals and errors, though I've forgotten how this is done. Anyway, if you have a large bias due to the residuals being bigger than the R2eff error, using the residuals for error analysis is not correct. It will introduce larger errors, that is guaranteed. So you will have the result that kex has larger errors. But it is useful to understand the theoretical reason why. A large component of that kex error is the modelling bias. So if sum_i(residual_i / error_i) > 1.0, then you likely have under-fitting. This could be caused by clustering or the 2-site model being insufficient. In any case, using the residuals for an error analysis to work around kex errors being too small only indicates a problem with the data modelling. What about testing the covariance matrix technique? I guess that due to small amounts of data that single-item-out cross validation is not an option. Regards, Edward On 16 January 2015 at 18:25, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:Hi Edward. I do not claim that "Monte Carlo simulations" is not the gold standard. I am merely trying to investigate the method by which one draw theerrors.In the current case for dispersion, one trust the R2eff errors to bethedistribution. These are individual per spin. Another distribution could be from how well the clustered fitperformed.And this is what I am looking into. Best Troels 2015-01-16 18:09 GMT+01:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:Hi, Do the R2eff errors look reasonable? Another issue is in clustered analysis, certain parameters can be over-constrained by being shared between multiple data sets. This is the biased introduced by an under-fitted problem. This can artificially decrease the errors. Anyway, you should plot the Monte Carlo simulations, a bit like I did in figure 4 of my paper: d'Auvergne, E. J. and Gooley, P. R. (2006). Model-free model elimination: A new step in the model-free dynamic analysis of NMR relaxation data. J. Biomol. NMR, 35(2), 117-135. (http://dx.doi.org/10.1007/s10858-006-9007-z) That might indicate if something is wrong - i.e. if optimisation of certain simulations have failed. However this problem only causes errors to be bigger than they should be (unless all simulations have failed). I don't know how Monte Carlo simulations could fail otherwise. Monte Carlo simulations are the gold standard for error analysis. All other error analysis techniques are judged based on how close the approach this gold standard. Saying that the Monte Carlo simulations technique failed is about equivalent to claiming the Earth is flat! I challenge you to test the statement on a statistics professor at your Uni ;) Anyway, if Monte Carlo failed, using residuals will not save you as the failure point will be present in both techniques. What could have failed is the model or the input data. Under-fitting due to too much R2eff data variability in the spins of the cluster would be my guess. Do you see similarly small errors in the non-clustered analysis of the same data? Regards, Edward On 16 January 2015 at 17:48, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:Hi Edward. At the moment, I am fairly confident that I should investigate the distribution from which the errors are drawn. The method in relax draws from a Gauss distribution of the R2efferrors,but I should try to draw errors from the overall residual instead. It is two different methods. My PI, has earlier has before analysed the data with theaforementionedmethod, and got errors in the hundreds. Errors are 5-10% of the fitted global parameters. Having 0.5-1 percent error is way to small, and I see this for 4 ofmydatasets. So, something is fishy. Best Troels 2015-01-16 17:30 GMT+01:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx:Hi Troels, You should be very careful with your interpretation here. The curvature of the chi-squared space does not correlate with the parameter errors! Well, it most cases it doesn't. You will seethisif you map the space for different Monte Carlo simulations. Some extreme edge cases might help in understanding the problem. Letssayyou have a kex value of 100 with a real error of 1000. In thiscase,you could still have a small, perfectly quadratic minimum. Butthisminimum will jump all over the place with the simulations. Another extreme example might be kex of 100 with a real error of0.00000001.In this case, the chi-squared space could look similar to the screenshot you attached to the task ( http://gna.org/task/?7882). However Monte Carlo simulations may hardly perturb the chi-squared space. I have observed scenarios similar to these hypotheticalcaseswith the Lipari and Szabo model-free protein dynamics analysis. There is one case where the chi-squared space and error spacematch,and that is at the limit of the minimum when the chi-squared space becomes quadratic. This happens when you zoom right into theminimum.The correlation matrix approach makes this assumption. Monte Carlo simulations do not. In fact, Monte Carlo simulations are the gold standard. There is no technique which is better than Monte Carlo simulations, if you use enough simulations. You can only match itbyderiving exact symbolic error equations. Therefore you really should investigate how your optimisationspace isperturbed by Monte Carlo simulations to understand the correlation-or non-correlation - of the chi-squared curvature and the parameter errors. Try mapping the minimum for the simulations and see if the distribution of minima matches the chi-squared curvature (http://gna.org/task/download.php?file_id=23527). Regards, Edward On 16 January 2015 at 17:14, Troels E. Linnet <NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:URL: <http://gna.org/task/?7882> Summary: Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals Project: relax Submitted by: tlinnet Submitted on: Fri 16 Jan 2015 04:14:30 PM UTC Should Start On: Fri 16 Jan 2015 12:00:00 AM UTC Should be Finished on: Fri 16 Jan 2015 12:00:00 AM UTC Category: relax's source code Priority: 5 - Normal Status: In Progress Percent Complete: 0% Assigned to: tlinnet Open/Closed: Open Discussion Lock: Any Effort: 0.00 _______________________________________________________ Details: This is implemented due to strange results. A relaxation dispersion on data with 61 spins, and a monte carlo simulation with 500 steps, showed un-expected low errors. ------- results.read(file=fname_results, dir=dir_results) # Number of MC mc_nr = 500 monte_carlo.setup(number=mc_nr) monte_carlo.create_data() monte_carlo.initial_values() minimise.execute(min_algor='simplex', func_tol=1e-25, max_iter=int(1e7), constraints=True) monte_carlo.error_analysis() -------- The kex was 2111 and with error 16.6. When performing a dx.map, some weird results was found: i_sort dw_sort pA_sort kex_sort chi2_sort 471 4.50000 0.99375 2125.00000 4664.31083 470 4.50000 0.99375 1750.00000 4665.23872 So, even a small change with chi2, should reflect a larger deviation with kex. It seems, that change of R2eff values according to their errors,isnot "enough". According to the regression book of Graphpad http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf Page 33, and 104. Standard deviation of residuals is: Sxy = sqrt(SS/(N-p)) where SS is sum of squares. N - p, is the number of degrees of freedom. In relax, SS is spin.chi2, and is weighted. The random scatter to each R2eff point should be drawn from a gaussian distribution with a mean of Zero and SD equal to Sxy. Additional, find the 2.5 and 97.5 percentile for each parameter. The range between these values is the confidence interval. _______________________________________________________ File Attachments: ------------------------------------------------------- Date: Fri 16 Jan 2015 04:14:30 PM UTC Name: Screenshot-1.pngSize:161kB By: tlinnet <http://gna.org/task/download.php?file_id=23527> _______________________________________________________ Reply to this item at: <http://gna.org/task/?7882> _______________________________________________ Message sent via/by Gna! http://gna.org/ _______________________________________________ relax (http://www.nmr-relax.com) This is the relax-devel mailing list relax-devel@xxxxxxx To unsubscribe from this list, get a password reminder, or change your subscription options, visit the list information page at https://mail.gna.org/listinfo/relax-devel