Re: Addition of PDC support to relax 1.3.10 and an inconsistency in the error calculation. -- February 23, 2011

Title: E-Mail Signatur Bruker Coporation (Bruker BioSpin) V0.2

Dear Edward,

I will provide the other output file examples as soon as the format is accepted by you. We still
have a few points open.

The relation between dialog window and ouput:
dialog                                    explanation                                                             appears in output as
error estimation by fit all errors are unknown but the same for all y      "from fit using arbitray y uncertainties"
error estimation by weighted fit errors as listed in the output are used    "from fit using calculated y uncertainties"
error estimation by Monte Carlo N fits with synthetically modified y values          "from Monte Carlo simulations"

errorScale:
Matlab and other software packages provide the fit results within confidence bounds which depend on a confidence
level and number of degrees of freedom. The formula is c = f +/- t* fErr. f would be the fitted parameter value,
fErr is the fit parameter error as coming out of Marquardt or MC, t is calculated.

The pdf output of the PDC shows f and (t*fErr) but you only want f and fErr. Therefore I have added an extra column
for you, called it errScale and put t into it. The column contains always the same number since at the moment
all peaks are fitted to the same model function and there is one user defined confidence level.

This whole thing has nothing to do with NC_Proc. NC_proc for the scaling of all datapoints of spectra. This is because
Bruker originally did the FTT in integer arithmetic. To use as many bits as possible in the integers the data are scaled up
by 2**NC_proc. The parameter NC_proc is stored and any software can do the reverse calculation to get different spectra
comparable.

Monte Carlo:
We had a first initial internal meeting and there were some shaking heads here. I have furthermore searched
the internet for other software packages and articles and what we offer as Monte Carlo is indeed what many
others do as well offer as Monte Carlo. We agreed to go into some more details within the next few days, one
collegue will do some Matlab simulations and we will have another meeting then. So for the moment I would
not change anything in the PDC related to this topic.

People here accept that Marquardt's covariance method might cause problems depending on function and
actual data but for the relaxation functions it worked well and gave very similar results to what you obtained.
So it can't be that bad. It might be a more serious problem for other functions of course.

So, tell me if you want to other output files at current status or are there small things that should be changed
first.

Best regards,
Peter

On 2/22/2011 4:32 PM, Edward d'Auvergne wrote:

Hi,

Cheers.  Would you have new NOE and R2 data file examples as well?  If
you could, please try to attach the files to the task
(https://gna.org/task/?7180) rather than attaching it to a message, as
big attachments are quite a drain on the infrastructure.  I will
comment more below.


On 22 February 2011 11:31, Dr. Klaus-Peter Neidig
<peter.neidig@xxxxxxxxxxxxxxxxx> wrote:

Dear Edward,

below is the document I had recently sent. I have added 3 pictures and marked those options that you
recommend, i.e. peak intensities, variance averaging in case of replicates and fit using the determined
integral (or intensity) errors.

The choice the height or volume of the peak isn't too bad, but the
field most often uses heights (and almost always use the terminology
'heights' ;) as the measure as volumes are problematic when peaks
overlap. Does the variance averaging include using the baseplane RMSD
measure? Or is this only for when spectra are duplicated? Does the
PDC make a distinction between all spectra replicated and a few
replicated? These points are not clear from the 3 figures, though it
might be clearer when using the PDC.

For the error estimation method, the options "error estimation by fit"
or "error estimation by weighted fit" is not clear. Does this mean
using the low quality covariance matrix estimate (the estimated errors
are from the estimated matrix)? What are the weights? Is the
technique behind the unweighted and weighted "error estimation by fit"
explained to the user? Are the gold standard Monte Carlo simulations
not the default because of the longer computation time?

One last thing, I still really cannot understand the error in the
'results' section and the scale. Why does the scale even exist? Does
this have something to do with the Bruker NC_PROC parameter? Errors
are simply errors, they are the standard deviation defined very
precisely as the square root of the variance. When presenting the
details about a normal distribution, you give the mean and variance,
not a scaled variance. There is no need to scale the error. In the
results section with the header:

Peak name F1 [ppm] F2 [ppm] T1 [s] error errorScale

The actual real error is equal to the presented error divided by
errorScale. Why is a scaled error, which is more than double the real
error, presented to the user? I can only imagine that this will cause
user confusion.

That picture still contains the word Monte Carlo. There will be an internal meeting here at Bruker about
this topic and I have to wait what comes out. Your statement is that the PDC is rather doing bootstrapping
and that this is not useful to determine the errors of the fit parameters. I will put this into the discussion.

It most definitely is Bootstrapping and not Monte Carlo simulations.
To the statistical field, it is well known that Bootstrapping does not
produce an error estimate.  Whether the numbers are similar in certain
situations is not relevant.

In the output I changed "used integral errors" just into "integrals" not as I proposed yesterday "experimental
integral errors" which I think sounds stupid. In fact I now write integrals and integral errors for the 2 tables.
Strictly spoken integrals could be intensities of course.

Sorry, my confusion was not from the name but from the subsequent use
of those values by the PDC.  All of the terminology made it clear that
it was the errors.

I also attached  a new output example and this time I adjusted "variance averaging" instead of "worst case.."
Variance averaging results in one number that is added to the base plane RMSD. We then have identical
numbers in each column. Yesterday we used the "worst case.." option produces a number for each peak
that was added to the base plane RMSD. All error numbers in the table then were different.

I've made a comparison of relax's fit and the PDC values below.  The
error estimates are not identical, but they are very close.  The PDC
looks to be underestimating more then overestimating the errors (when
comparing to the gold standard).  This is probably because of the
difference between the error estimate using the itself estimated
covariance matrix verses Monte Carlo simulations.

Cheers,

Edward




[edward@localhost pdc_files]$ relax convert_data_r1.py



                                     relax repository checkout

                              Molecular dynamics by NMR data analysis

                             Copyright (C) 2001-2006 Edward d'Auvergne
                         Copyright (C) 2006-2011 the relax development team

This is free software which you are welcome to modify and redistribute
under the conditions of the
GNU General Public License (GPL).  This program, including all
modules, is licensed under the GPL
and comes with absolutely no warranty.  For details type 'GPL' within
the relax prompt.

Assistance in using the relax prompt and scripting interface can be
accessed by typing 'help' within
the prompt.

script = 'convert_data_r1.py'
----------------------------------------------------------------------------------------------------
#! /usr/bin/env python

from string import split, strip


# The file data.
file = open('testT1.txt')
lines = file.readlines()
file.close()

# The data - PDC T2, PDC T2 err, PDC scale factor, relax R2, relax R2
err (MC sim).
data = [
['Gln',  2, None, None, None,    2.1931543671,      0.0150953094197],
['Ile',  3, None, None, None,   2.33146407173,      0.0160577446857],
['Phe',  4, None, None, None,    2.3482964848,       0.019713488727],
['Val',  5, None, None, None,    2.2490812645,      0.0164275028474],
['Lys',  6, None, None, None,   2.27941978889,       0.015118434456],
['Thr',  7, None, None, None,   2.21973220186,      0.0142724888389],
['Leu',  8, None, None, None,   2.24024670209,      0.0220723067644],
['Thr',  9, None, None, None,    1.9682398599,      0.0320202880659],
['Gly', 10, None, None, None,   2.08801972269,      0.0133512466715],
['Lys', 11, None, None, None,   1.99776427143,      0.0119683402446],
['Thr', 12, None, None, None,   2.16448623389,      0.0174891646325],
['Ile', 13, None, None, None,   2.27255388183,      0.0203045621019],
['Thr', 14, None, None, None,   2.19928611083,      0.0137179789481],
['Leu', 15, None, None, None,   2.27943290864,        0.01841703596],
['Glu', 16, None, None, None,   2.08711900135,      0.0110867843226],
['Val', 17, None, None, None,   2.28472626854,      0.0149813716965],
['Glu', 18, None, None, None,   2.10803348086,      0.0165871439247],
['Ser', 20, None, None, None,   2.24335721396,      0.0136423661701],
['Asp', 21, None, None, None,    2.3491468761,      0.0123466109271],
['Thr', 22, None, None, None,   2.27508355307,      0.0179254994426],
['Ile', 23, None, None, None,   2.46729511309,       0.022032166148],
['Asn', 25, None, None, None,   2.39562318634,      0.0182835550536],
['Val', 26, None, None, None,   2.13340147848,      0.0124491345714],
['Lys', 27, None, None, None,   2.36258684463,       0.021452496764],
['Ala', 28, None, None, None,   2.38480717279,      0.0167515948037],
['Lys', 29, None, None, None,   2.32696831127,      0.0186312168945],
['Thr', 30, None, None, None,   2.42063807895,       0.016486280768],
['Gln', 31, None, None, None,   2.37225963882,      0.0145645530678],
['Asp', 32, None, None, None,   2.34327154101,      0.0141218929004],
['Lys', 33, None, None, None,    2.1927627806,      0.0133120293255],
['Glu', 34, None, None, None,   2.23809535599,      0.0144347657445],
['Gly', 35, None, None, None,    2.2461346515,      0.0122274121117],
['Ile', 36, None, None, None,   1.99122925973,       0.010910732682],
['Asp', 39, None, None, None,   2.29368330103,      0.0126561091892],
['Gln', 40, None, None, None,   2.25464084772,      0.0146681195393],
['Gln', 41, None, None, None,   2.22441709558,      0.0180042863108],
['Arg', 42, None, None, None,   2.28040886598,      0.0226407473358],
['Leu', 43, None, None, None,   2.17231047053,      0.0237437949291],
['Ile', 44, None, None, None,   2.27954369008,      0.0201909508149],
['Phe', 45, None, None, None,   2.26618944243,      0.0208950941528],
['Ala', 46, None, None, None,   2.17293180472,      0.0496094013897],
['Gly', 47, None, None, None,   2.19171718851,      0.0143718164149],
['Lys', 48, None, None, None,   2.20180904452,      0.0159731114836],
['Gln', 49, None, None, None,   2.11417079383,      0.0137897460469],
['Leu', 50, None, None, None,   2.24457803209,      0.0193730177104],
['Glu', 51, None, None, None,   2.13348726698,      0.0178416908891],
['Asp', 52, None, None, None,    2.0460014256,      0.0130027410002],
['Arg', 54, None, None, None,   2.19214984513,      0.0173101801326],
['Thr', 55, None, None, None,   2.27509231624,      0.0229093320248],
['Leu', 56, None, None, None,   2.36912374644,      0.0178696073777],
['Ser', 57, None, None, None,   2.29814984784,      0.0129737044328],
['Asp', 58, None, None, None,   2.37723175535,      0.0159530364094],
['Tyr', 59, None, None, None,   2.22560065336,      0.0147756564606],
['Asn', 60, None, None, None,   2.28460144064,      0.0121672447449],
['Ile', 61, None, None, None,   2.34228508004,      0.0180155615894],
['Gln', 62, None, None, None,   1.97375590837,      0.0127517602123],
['Lys', 63, None, None, None,   2.16513054478,     0.00997271942879],
['Glu', 64, None, None, None,   2.31915705344,      0.0199426460098],
['Ser', 65, None, None, None,   2.18919504068,      0.0149076372426],
['Thr', 66, None, None, None,    2.1815681959,      0.0155283232121],
['Leu', 67, None, None, None,   2.26786646063,      0.0200115697764],
['His', 68, None, None, None,   2.24651038599,      0.0204872753534],
['Leu', 69, None, None, None,   2.24847954259,      0.0171435701978],
['Val', 70, None, None, None,   2.31549239579,      0.0204085987616],
['Leu', 71, None, None, None,   2.17566168322,      0.0141512899583],
['Arg', 72, None, None, None,    2.1637210197,      0.0124881076678],
['Leu', 73, None, None, None,   1.85663248042,     0.00972554000058],
['Arg', 74, None, None, None,   1.61255064461,      0.0109491376192],
['Gly', 75, None, None, None,   1.16260116356,      0.0127191083108],
['Gly', 76, None, None, None,  0.763062241201,     0.00312485609259]]

# Get the data.
in_data = False
index = 0
for line in lines:
    # Split the line.
    row = split(line, "\t")

    # Strip the rubbish.
    for j in range(len(row)):
        row[j] = strip(row[j])

    # Empty line.
    if len(row) == 0:
        continue

    # The section.
    if row[0] == 'SECTION:' and row[1] == 'results':
        in_data = True
        continue

    # Not in the data section.
    if not in_data:
        continue

    # The header line.
    if row[0] == 'Peak name':
        continue

    # The values.
    data[index][2] = float(row[3])
    data[index][3] = float(row[4])
    data[index][4] = float(row[5])

    # Increment the residue index.
    index += 1

r1 = []
r1_err = []
print("%-4s %-3s %-15s %-15s %-15s %-15s" % ('Name', 'Num', 'r1',
'err', 'r1_relax_ratio', 'err_relax_ratio'))
for i in range(len(data)):
    r1.append(1.0/data[i][2])
    r1_err.append((data[i][3] / data[i][4]) / data[i][2]**2)

    print("%-4s %-3s %15.10f %15.10f %15.10f %15.10f" % (data[i][0],
data[i][1], r1[-1], r1_err[-1], r1[-1]/data[i][5],
r1_err[-1]/data[i][6]))

----------------------------------------------------------------------------------------------------
Name Num r1              err             r1_relax_ratio
err_relax_ratio
Gln  2      2.1929824561    0.0148809999    0.9999216147
0.9858029042
Ile  3      2.3315458149    0.0163053899    1.0000350609
1.0154221671
Phe  4      2.3485204321    0.0203675004    1.0000953659
1.0331758464
Val  5      2.2492127755    0.0166899526    1.0000584732
1.0159762412
Lys  6      2.2794620470    0.0157196001    1.0000185390
1.0397637516
Thr  7      2.2197558269    0.0145190072    1.0000106432
1.0172722778
Leu  8      2.2401433692    0.0218844554    0.9999538743
0.9914892740
Thr  9      1.9681165125    0.0315532109    0.9999373311
0.9854130882
Gly  10     2.0881186051    0.0135167223    1.0000473570
1.0123940199
Lys  11     1.9976028765    0.0124515847    0.9999192122
1.0403769011
Thr  12     2.1645021645    0.0175877778    1.0000073600
1.0056385321
Ile  13     2.2727272727    0.0193061408    1.0000762978
0.9508277371
Thr  14     2.1992522542    0.0137104168    0.9999846056
0.9994487429
Leu  15     2.2794620470    0.0184387216    1.0000127831
1.0011774750
Glu  16     2.0872469213    0.0117003083    1.0000612902
1.0553383197
Val  17     2.2846698652    0.0148687303    0.9999753129
0.9924812359
Glu  18     2.1079258010    0.0157358966    0.9999489193
0.9486802945
Ser  20     2.2431583670    0.0139071770    0.9999113619
1.0194109174
Asp  21     2.3490721165    0.0119404908    0.9999681759
0.9671067493
Thr  22     2.2753128555    0.0182975360    1.0001007886
1.0207545988
Ile  23     2.4673081668    0.0226133071    1.0000052907
1.0263769318
Asn  25     2.3957834212    0.0180897942    1.0000668865
0.9894024526
Val  26     2.1335609132    0.0116003422    1.0000747326
0.9318191703
Lys  27     2.3623907394    0.0204662691    0.9999169956
0.9540273708
Ala  28     2.3849272597    0.0166220009    1.0000503550
0.9922637881
Lys  29     2.3272050268    0.0189131753    1.0001017270
1.0151336546
Thr  30     2.4207213750    0.0167409705    1.0000344108
1.0154485866
Gln  31     2.3724792408    0.0148096778    1.0000925708
1.0168302292
Asp  32     2.3430178069    0.0144732222    0.9998917180
1.0248783413
Lys  33     2.1929824561    0.0131268532    1.0001001821
0.9860895665
Glu  34     2.2381378693    0.0140511562    1.0000189953
0.9734246076
Gly  35     2.2461814915    0.0118392398    1.0000208536
0.9682539250
Ile  36     1.9912385504    0.0108451543    1.0000046658
0.9939895492
Asp  39     2.2935779817    0.0127101007    0.9999540829
1.0042660471
Gln  40     2.2547914318    0.0147550484    1.0000667885
1.0059263816
Gln  41     2.2241992883    0.0182240447    0.9999020834
1.0122058949
Arg  42     2.2805017104    0.0217261363    1.0000407139
0.9596033213
Leu  43     2.1724961981    0.0234451234    1.0000854977
0.9874210682
Ile  44     2.2794620470    0.0205899990    0.9999641844
1.0197637155
Phe  45     2.2660321777    0.0217755697    0.9999306039
1.0421379092
Ala  46     2.1729682747    0.0483898292    1.0000167837
0.9754165112
Gly  47     2.1915406531    0.0148651034    0.9999194534
1.0343232176
Lys  48     2.2016732717    0.0155452463    0.9999383358
0.9732134099
Gln  49     2.1141649049    0.0142897383    0.9999972145
1.0362582636
Leu  50     2.2446689113    0.0197525364    1.0000404883
1.0195900640
Glu  51     2.1335609132    0.0185638164    1.0000345192
1.0404740540
Asp  52     2.0458265139    0.0132912614    0.9999145105
1.0221892002
Arg  54     2.1920210434    0.0176455608    0.9999412441
1.0193747658
Thr  55     2.2753128555    0.0229745618    1.0000969364
1.0028472997
Leu  56     2.3691068467    0.0178614867    0.9999928667
0.9995455566
Ser  57     2.2983222248    0.0131015178    1.0000750068
1.0098517268
Asp  58     2.3769907297    0.0162427580    0.9998986108    1.0181609029
Tyr  59     2.2256843980    0.0154779709    1.0000376279    1.0475318627
Asn  60     2.2846698652    0.0125098898    1.0000299503    1.0281612720
Ile  61     2.3424689623    0.0183247678    1.0000785055    1.0171632834
Gln  62     1.9739439400    0.0128640189    1.0000952659    1.0088033912
Lys  63     2.1649707729    0.0098056514    0.9999262068    0.9832475002
Glu  64     2.3191094620    0.0194617967    0.9999794790    0.9758883889
Ser  65     2.1891418564    0.0151022978    0.9999757060    1.0130577769
Thr  66     2.1815008726    0.0153661351    0.9999691400    0.9895553365
Leu  67     2.2680880018    0.0202677453    1.0000976871    1.0128013689
His  68     2.2466861379    0.0203686835    1.0000782333    0.9942114404
Leu  69     2.2487069935    0.0157971135    1.0001011576    0.9214599619
Val  70     2.3153507756    0.0204338316    0.9999388380    1.0012363831
Leu  71     2.1758050479    0.0144021081    1.0000658947    1.0177240510
Arg  72     2.1635655560    0.0124230435    0.9999281499    0.9947899129
Leu  73     1.8566654289    0.0098446747    1.0000177464    1.0122496780
Arg  74     1.6126431221    0.0110867953    1.0000573486    1.0125724704
Gly  75     1.1626555052    0.0130808608    1.0000467414    1.0284416584
Gly  76     0.7627765065    0.0029557520    0.9996255421    0.9458842109
.

Dr. Klaus-Peter Neidig Software Development / Head of Analysis Group
Bruker BioSpin GmbH Silberstreifen 76287 Rheinstetten, Germany	Phone: +49 721 5161-6447 Fax: +49 721 5161-6480	Peter.Neidig@xxxxxxxxxxxxxxxxx www.bruker.com
Bruker BioSpin GmbH: Sitz der Gesellschaft/Registered Office: Rheinstetten, HRB 102368 Amtsgericht Mannheim Geschäftsführer/Managing Directors: Jörg Laukien, Dr. Bernd Gewiese, Dr. Gerhard Roth Diese E-Mail und alle Anlagen können Betriebs- oder Geschäftsgeheimnisse, oder sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine Vervielfältigung oder Weitergabe der E-Mail und aller Anlagen ausdrücklich untersagt. Bitte benachrichtigen Sie den Absender und löschen/vernichten Sie die empfangene E-Mail und alle Anlagen. Vielen Dank. This message and any attachments may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this e-mail in error, you are hereby notified that any review, copying or distribution of it and its attachments is strictly prohibited. Please inform the sender immediately and delete/destroy the original message and any copies. Thank you.

Re: Addition of PDC support to relax 1.3.10 and an inconsistency in the error calculation.

Header

Content

Related Messages