Just wait, I got that wrong! First some definitions:
n = number of replicated spectra,
i = each spin system or peak,
k = total number of peaks.
The sum at the bottom of would mean that if n-1=1, then the
denominator would be a large sum. As n-1 would be the same for all i,
then this becomes the averaged variance divided by k! Therefore as
you have more and more peaks in the spectrum, the smaller and smaller
the estimator will be. Taking this to the extreme, as you approach
infinite peaks in the spectrum, the error approaches zero. That seems
absurd. Maybe in this case, the unbiased estimator is absurd ;) I
think I should read that 3rd link you posted.
Regards,
Edward
On 19 June 2013 15:15, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Hi,
I'm quite aware of this. Another useful link is:
http://en.wikipedia.org/wiki/Pooled_variance
This has also been pointed out to me by Robert Schneider (but not on
the mailing lists). I am wondering if it is worth it as the number of
users who would benefit are quite low. The reason for this is that
most users will only have duplicate spectra. Therefore n-1 ends up
being 1, as n is the number of replicated spectra, and this collapses
down to the currently used variance averaging. In the case where you
have collected spectra in triplicate, then implementing this makes
sense. But the number of people using relax with triplicate spectra
in the last 12 years is probably 1 or 2. So it would be good to
implement this, but it's priority is very low. In any case, both
averaged variances and pooled variances from a large collection of 2
point sets is horrible statistics, but that's all we've got.
Note also that there are two averaging steps. The first is to average
the variance for all peaks in the spectrum. The variance for a single
peak is the dirty estimate from 2 points. Then if some spectra are
only measured once, then the variances for all spectra are averaged.
Regards,
Edward
On 19 June 2013 14:50, Troels E. Linnet
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
URL:
<http://gna.org/support/?3045>
Summary: Support for pooled standard deviation for: Peak
heights with partially replicated spectra
Project: relax
Submitted by: tlinnet
Submitted on: Wed 19 Jun 2013 12:50:08 PM GMT
Category: None
Priority: 5 - Normal
Severity: 3 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Operating System: None
_______________________________________________________
Details:
According to the manual,
http://www.nmr-relax.com/manual/spectrum_error_analysis.html,
the variance for the replicated datasets are averaged, and used as the
variance for single replicated spectrum.
This is a very reasonable assumption, but I wonder if a pooled standard
deviation should be used instead.
If we look in the definition of IUPAC Gold Book:
http://goldbook.iupac.org/P04758.html
"""
Results from various series of measurements can be combined in the
following
way to give a pooled relative standard deviation $s_{r,p}$:
$$
s_{r,p}=\sqrt{\frac{\sum(n_i-1)s_{r,i}^2}{\sum n_i -1}} =
\sqrt{\frac{\sum(n_i-1)s_i^2x_i^{-2}}{\sum n_i -1}}
$$
"""
It is not an easy subject, and the discussion can be "hot": See for example
these gals and gils: http://www.physicsforums.com/showthread.php?t=268377
So my question is, is the use of average of variances the right way to
estimate the variance for single recorded data point?
And should another way be implemented?
_______________________________________________________
Reply to this item at:
<http://gna.org/support/?3045>
_______________________________________________
Message sent via/by Gna!
http://gna.org/
_______________________________________________
relax (http://www.nmr-relax.com)
This is the relax-devel mailing list
relax-devel@xxxxxxx
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel