Follow-up Comment #1, task #7180 (project relax):
The following file is a plot (and the script) of the distribution of error
estimates. I am using a Gaussian centered at 20 with a standard deviation of
1. I take 2 points (peak intensities from duplicated spectra) randomly from
the distribution and calculate the difference. This is then repeated N times
(number of duplicated spectra) and the maximum difference is taken. I repeat
this M times for determining some stats, where M = 1e6. The average error
estimate is:
N = 2, ave(error) = 1.594
N = 3, ave(error) = 1.875
N = 4, ave(error) = 2.070
N = 5, ave(error) = 2.221
N = 6, ave(error) = 2.339
N = 100, ave(error) = 3.884
N = 2, sd(error) = 0.852
N = 3, sd(error) = 0.829
N = 4, sd(error) = 0.806
N = 5, sd(error) = 0.786
N = 6, sd(error) = 0.770
N = 100, sd(error) = 0.566
These are all log-normal distributions. As you can see, the error estimate
is always on average overestimated, and the more duplicated spectra N, the
worse this becomes. The spread of values is also a worry. With 3 duplicated
spectra, the resultant errors are 1.875 +/- 0.829 which means that the error
estimates are all over the place. The perfect estimate would be 1.000 +/-
0.000, as the sd is exactly 1.
(file #11436, file #11437, file #11438)
_______________________________________________________
Additional Item Attachment:
File name: sampling_test.py Size:1 KB
File name: sampling_test_good.agr Size:8 KB
File name: sampling_test_good.ps Size:15 KB
_______________________________________________________
Reply to this item at:
<http://gna.org/task/?7180>
_______________________________________________
Message sent via/by Gna!
http://gna.org/