mailRe: Curve fitting


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on October 16, 2008 - 11:07:
On Thu, Oct 16, 2008 at 7:02 AM, Chris MacRaild <macraild@xxxxxxxxxxx> wrote:
On Thu, Oct 16, 2008 at 3:11 PM, Sébastien Morin
<sebastien.morin.1@xxxxxxxxx> wrote:
Hi,

I have a general question about curve fitting within relax.

Let's say I proceed to curve fitting for some relaxation rates
(exponential decay) and that I have a duplicate delay for error estimation.

========
delays

0.01
0.01
0.02
0.04
...
========

Will the mean value (for delay 0.01) be used for curve fitting and rate
extraction ?
Or will both values at delay 0.01 be used during curve fitting, hence
giving more weight on delay 0.01 ?

In other words, will the fit only use both values at delay 0.01 for
error estimation or also for rate extraction, giving more weight for
this duplicate point ?

How is this handled in relax ?

Instinctively, I would guess that the man value must be used for
fitting, as we don't want the points that are not in duplicate to count
less in the fitting procedure... Am I right ?


I would argue not. If we have gone to the trouble of measuring
something twice (or, equivalently, measuring it with greater
precision) then we should weight it more strongly to reflect that.

So we should include both duplicate points in our fit, or we should
just use the mean value, but weight it to reflect the greater
certainty we have in its value.

As I type this I realise this is likely the source of the sqrt(2)
factor Tyler and Edward have been debating on a parallel thread - the
uncertainty in height of any one peak is equal to the RMS noise, but
the std error of the mean of duplicates is less by a factor of
sqrt(2).

At the moment, relax simply uses the mean value in the fit.  Despite
the higher quality of the duplicated data, all points are given the
same weight.  This is only because of the low data quantity.  As for
dividing the sd of differences between duplicate spectra by sqrt(2),
this is not done in relax anymore.  Because some people have collected
triplicate spectra, although rare, relax calculates the error from
replicated spectra differently.  I'm prepared to be told that this
technique is incorrect though.  The procedure relax uses is to apply
the formula:

sd^2 = sum({Ii - Iav}^2) / (n - 1),

where n is the number of spectra, Ii is the intensity in spectrum i,
Iav is the average intensity, sd is the standard deviation, and sd^2
is the variance.  This is for a single spin.  The sample number is so
low that this value is completely meaningless.  Therefore the variance
is averaged across all spins (well due to a current bug the standard
deviation is averaged instead).  Then another averaging takes place if
not all spectra are duplicated.  The variances across all duplicated
spectra are averaged to give a single error value for all spins across
all spectra (again the sd averaging bug affects this).  The reason for
using this approach is that you are not limited to duplicate spectra.
It also means that the factor of sqrt(2) is not applicable.  If only
single spectra are collected, then relax's current behaviour of not
using sqrt(2) seems reasonable.

Regards,

Edward


P.S.  The idea for the 1.3 line is to create a new class of user
functions, 'spectrum.read_intensities()', 'spectrum.set_rmsd()',
'spectrum.error_analysis()', etc. to make all of this independent of
the analysis type.  See
https://mail.gna.org/public/relax-devel/2008-10/msg00029.html for
details.



Related Messages


Powered by MHonArc, Updated Fri Oct 17 02:00:27 2008