Re: [task #7882] Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals -- January 19, 2015

Hi Troels,

I have used this regression book.
http://www.graphpad.com/faq/file/Prism4RegressionBook.pdf

I like the language (fun and humoristic), and goes over a great detail.
For example, there is quite a list of weighting methods.
I find the comments on page 28 a little disturbing.
(See also page 86+87)


We already weight by the individual data point error.  As we are using
non-linear least squared fitting of the chi2 value, the weights
described on pages 86 and 87 are not relevant.  For dispersion data
they are also not relevant as the NMR spectrum is so complicated that
any of these weights will not reliably replicate the field dependent
effects, the peak height effects, etc.

The Monte-Carlo simulation is described at page 104.

1) Create an ideal data set.
-> Check. This is done with relax
monte_carlo.create_data(method='back_calc')


This is correct.

2) Add random scatter.
relax now add random scatter, per individual datapoint. The random scatter
is drawn from the measured error of the datapoint.


Point 2) is correct, it is how you perform Monte Carlo simulations for
non-linear regression.  The reason is two fold - because in regression
you don't know the error sigma_i, and because the residuals can be
used as an estimator of sigma_i.  But there is a much better error
estimator.  That is to use residuals in bootstrapping to estimate the
data errors (as mentioned on page 108).  This would be far superior!

However we are using non-linear least squares optimisation, not
regression.  Therefore we do not need an estimator of the errors, as
we already know the errors.  The residuals or error bootstrapping
techniques are only there to estimate the measured error, which we
already have.  If the residues or error bootstrapping have been
successful, these should converge to the measured errors.  So this is
unlikely to be the source of your too low kex error.

But the book suggest adding errors described by variance of the residuals.
This is described at page 33.
"If you chose to weight the values and minimize the relative distance
squared (or some other weighting function), goodness-of-fit is quantified
with the weighted sum- of-squares."

Sy.x = sqrt(SS/dof) = sqrt(chi2 / dof)
The question is of course, if SS should be sum of squared errors for the
weighted points, or the non-weighted R2eff points.

Note that the variance of the residuals is an error estimator. All
estimators also have an error (this is the error of an error, or sigma
of sigma_i). In the case of a single spin system, as there are not
many R2eff points, sigma of sigma_i will be big (you need 500+ points
before sigma of sigma_i is reasonably small). Hence your error
estimates from such methods will be very noisy.

In any case, because the chi-squared value has been optimised, this is
not the same solution as the regression of the SSE. The two minima
are not necessarily the same. They converge under certain strong
conditions in the regression problem (Gaussian errors, no outliers,
and errors for all points for all field strengths are the same).
Because the two solutions are not the same you cannot use the SSE
value, which would have to be calculated from the base data and
back-calculated data, for the error estimate. It can only be used
strictly under the convergence condition.

I don't know of any error estimate for the non-linear least squares
optimisation problem. But one probably has been derived for the cases
when the errors are not known. This would require a different
reference, as the GraphPad Prism 4 book only covers regression and not
least squares and hence cannot give the correct answer. The Numerical
Optimisation book by Nocedal and Wright
(https://books.google.de/books?id=VbHYoSyelFcC&lpg=PP1&dq=numerical%20optimisation%20nocedal%20wright&pg=PP1#v=onepage&q=numerical%20optimisation%20nocedal%20wright&f=false)
also doesn't seem to cover this. Do you know the Numerical Recipes
books (http://www.nr.com/)? Maybe there is something in there. Monte
Carlo simulations are described very clearly in section 15.6 of the
second edition. There might be something in chapter 15, "the
modelling of data" detailing the correct error estimate for this
problem.

As a side note, for the non-linear least squares problem when errors
are unknown and Monte Carlo simulations are not an option, the
covariance matrix might be the best error estimate.

Regards,

Edward

Re: [task #7882] Implement Monte-Carlo simulation, where errors are generated with width of standard deviation or residuals

Header

Content

Related Messages