mailRe: [sr #3045] Support for pooled standard deviation for: Peak heights with partially replicated spectra


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Troels Emtekær Linnet on June 19, 2013 - 16:04:
Can we take an example?

Can I easily loop over 
replicates, and extract intensity from just one spin?

cdp.replicates
[['0_2', '7_2', '14_2'], ['15_14', '16_14', '20_14'], ['3_30', '8_30', '17_30'], ['9_46', '19_46', '22_46']]


Troels Emtekær Linnet


2013/6/19 Edward d'Auvergne <edward@xxxxxxxxxxxxx>
And again I got it wrong :)  It's not the averaged variance divided by
k!  It's the sum of variances divided by k, i.e. the average variance.
 Therefore as all peaks are either duplicated, triplicated, etc., then
in all cases the pooled variances collapses down to the average
variance.  Therefore as I see it, the pooled variance for replicated
spectra is redundant.  Or am I wrong again?

Regards,

Edward



On 19 June 2013 15:27, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
> Just wait, I got that wrong!  First some definitions:
>
> n = number of replicated spectra,
> i = each spin system or peak,
> k = total number of peaks.
>
> The sum at the bottom of would mean that if n-1=1, then the
> denominator would be a large sum.  As n-1 would be the same for all i,
> then this becomes the averaged variance divided by k!  Therefore as
> you have more and more peaks in the spectrum, the smaller and smaller
> the estimator will be.  Taking this to the extreme, as you approach
> infinite peaks in the spectrum, the error approaches zero.  That seems
> absurd.  Maybe in this case, the unbiased estimator is absurd ;)  I
> think I should read that 3rd link you posted.
>
> Regards,
>
> Edward
>
>
>
>
> On 19 June 2013 15:15, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
>> Hi,
>>
>> I'm quite aware of this.  Another useful link is:
>>
>> http://en.wikipedia.org/wiki/Pooled_variance
>>
>> This has also been pointed out to me by Robert Schneider (but not on
>> the mailing lists).  I am wondering if it is worth it as the number of
>> users who would benefit are quite low.  The reason for this is that
>> most users will only have duplicate spectra.  Therefore n-1 ends up
>> being 1, as n is the number of replicated spectra, and this collapses
>> down to the currently used variance averaging.  In the case where you
>> have collected spectra in triplicate, then implementing this makes
>> sense.  But the number of people using relax with triplicate spectra
>> in the last 12 years is probably 1 or 2.  So it would be good to
>> implement this, but it's priority is very low.  In any case, both
>> averaged variances and pooled variances from a large collection of 2
>> point sets is horrible statistics, but that's all we've got.
>>
>> Note also that there are two averaging steps.  The first is to average
>> the variance for all peaks in the spectrum.  The variance for a single
>> peak is the dirty estimate from 2 points.  Then if some spectra are
>> only measured once, then the variances for all spectra are averaged.
>>
>> Regards,
>>
>> Edward
>>
>>
>>
>>
>>
>>
>> On 19 June 2013 14:50, Troels E. Linnet
>> <NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
>>> URL:
>>>   <http://gna.org/support/?3045>
>>>
>>>                  Summary: Support for pooled standard deviation for: Peak
>>> heights with partially replicated spectra
>>>                  Project: relax
>>>             Submitted by: tlinnet
>>>             Submitted on: Wed 19 Jun 2013 12:50:08 PM GMT
>>>                 Category: None
>>>                 Priority: 5 - Normal
>>>                 Severity: 3 - Normal
>>>                   Status: None
>>>                  Privacy: Public
>>>              Assigned to: None
>>>         Originator Email:
>>>              Open/Closed: Open
>>>          Discussion Lock: Any
>>>         Operating System: None
>>>
>>>     _______________________________________________________
>>>
>>> Details:
>>>
>>> According to the manual,
>>> http://www.nmr-relax.com/manual/spectrum_error_analysis.html,
>>> the variance for the replicated datasets are averaged, and used as the
>>> variance for single replicated spectrum.
>>>
>>> This is a very reasonable assumption, but I wonder if a pooled standard
>>> deviation should be used instead.
>>>
>>> If we look in the definition of IUPAC Gold Book:
>>> http://goldbook.iupac.org/P04758.html
>>>
>>> """
>>> Results from various series of measurements can be combined in the following
>>> way to give a pooled relative standard deviation $s_{r,p}$:
>>>
>>> $$
>>> s_{r,p}=\sqrt{\frac{\sum(n_i-1)s_{r,i}^2}{\sum n_i -1}} =
>>> \sqrt{\frac{\sum(n_i-1)s_i^2x_i^{-2}}{\sum n_i -1}}
>>> $$
>>> """
>>>
>>> It is not an easy subject, and the discussion can be "hot": See for example
>>> these gals and gils: http://www.physicsforums.com/showthread.php?t=268377
>>>
>>>
>>> So my question is, is the use of average of variances the right way to
>>> estimate the variance for single recorded data point?
>>> And should another way be implemented?
>>>
>>>
>>>
>>>
>>>     _______________________________________________________
>>>
>>> Reply to this item at:
>>>
>>>   <http://gna.org/support/?3045>
>>>
>>> _______________________________________________
>>>   Message sent via/by Gna!
>>>   http://gna.org/
>>>
>>>
>>> _______________________________________________
>>> relax (http://www.nmr-relax.com)
>>>
>>> This is the relax-devel mailing list
>>> relax-devel@xxxxxxx
>>>
>>> To unsubscribe from this list, get a password
>>> reminder, or change your subscription options,
>>> visit the list information page at
>>> https://mail.gna.org/listinfo/relax-devel


Related Messages


Powered by MHonArc, Updated Wed Jun 19 17:20:07 2013