mailRe: [bug #23644] monte_carlo.error_analysis() does not update the mean value/expectation value from simulations


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on June 15, 2015 - 15:45:
On 15 June 2015 at 15:33, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
On 15 June 2015 at 15:28, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> 
wrote:
Hi Edward.

What do you think about this bug report?

I added some figures, showing that the parameter values does not represent
the expectation value of the Monte-Carlo simulation distribution.

Did you see my response at ...  Oh, it was not reply-to-all and it
went to the <NO-REPLY.INVALID-ADDRESS@xxxxxxx> email address only!  My
email from 3 hours ago was:

"""
This is actually the definition of Monte Carlo simulations.  The
parameter value is the optimised value and the parameter error is the
standard deviation of the back-calculated distribution.  There are two
opposite and very much related values which do not have a great
statistical meaning.  That is the mean of the back-calculated
distribution and the standard deviation of the non-back-calculated
distribution.  These are unused for good reason.  You can create the
non-back-calculated distribution by using the bootstrapping in relax -
the mean of this will equal the optimised parameter value, but the
standard deviation will not match the MC standard deviation.  I
suggest looking at the Numerical Recipes books as they have a great
diagram of the Monte Carlo simulation setup and how the parameter
value and error are calculated.
"""

In essence, you have stumbled upon a very important statistics
concept.  You'll see this written up in my PhD thesis (
https://minerva-access.unimelb.edu.au/handle/11343/39174 ),
specifically the section "2.2.1 Model selection theory for NMR
relaxation", and the paragraph "The four relaxation data sets".  I'll
reproduce the text for reference:

"""
For a single nucleus four different types of relaxation data sets
exist, the true set Rtrue, the sample set R, the true back calculated
set Rtrue(θ), and the back calculated set R(θ). A relaxation data set
is defined as the collection of all the relaxation values which
influence the model. θ is the vector whose elements are the parameters
of the model. The true set is the true relaxation data underlying the
measured data. It can never be observed due to noise. The sample set
is the experimentally available or measured relaxation data set and is
the true set plus noise. The true back calculated and back calculated
sets are determined from the model-free parameters which are fitted
using the true or sample sets respectively. The differences between
the models are reflected in the two back calculated sets whereas the
true and sample sets remain constant. For each of the four data sets
there is a corresponding error set with the same dimension. By
assuming Gaussian errors the data and error sets together describe a
set of normal probability distribution functions (pdfs) with one
normal pdf for each data point. It is assumed that all four error sets
are identical and therefore the one error set σ will be used in
association with all four data sets.
"""

You are seeing two of these 4 distributions!  That is the sample set
and back calculated set, not the true set or true back calculated set.
Keep reading the "Maximum likelihood" and the full text of
"Discrepancies", and then hopefully you'll be a master of these
frequentist statistics concepts ;)

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Wed Jun 17 17:40:05 2015