Re: Model-free analysis error -- March 26, 2014

Hi Ivanhoe.

I remember that Edward added a section to the manual about running time, and probably somewhere on the relax mailing list that question is answered.

But it depends on how many spins, computer power, and how many CPU.

But I think that anything between a day and a week is expected (best guess is a week ? )

Especially for model-free analysis.

Best

Troels

2014-03-26 15:55 GMT+01:00 Ivanhoe Leung <ivanhoe.leung@xxxxxxxxxxxxx>:

Dear Edward and Troels,

Thanks for all your suggestions - they have all been really helpful. I have now added the errors onto the T1/T2/NOE lists (I use CCPNmr Analysis).

Just one more question (maybe a stupid one) - how long does it take to do the full model free analysis (I just use the default settings on the GUI). Its been running for more than 6 hours now and the 'Incremental progress' is still showing less than 30% completion

Thanks

Ivan

________________________________________
From: edward.dauvergne@xxxxxxxxx [edward.dauvergne@xxxxxxxxx] on behalf of Edward d'Auvergne [edward@xxxxxxxxxxxxx]

Sent: 25 March 2014 18:03

To: Ivanhoe Leung
Cc: relax-users@xxxxxxx
Subject: Re: Model-free analysis error

Hi Ivan and Troels,

Continuing from
http://thread.gmane.org/gmane.science.nmr.relax.devel/5263/focus=5268,
the default error of 1.0 is important. This causes the chi-square
value to collapse into the sum of squared errors (SSE), making the
analysis still possible. However the real errors should really be
provided at all times. A model-free analysis requires the greatest
precision and accuracy of all NMR analysis types. Not providing
errors, i.e. using crude errors of 1.0, for all data will really fubar
your results (https://en.wikipedia.org/wiki/Military_slang#FUBAR). If
these are not available, please use the relaxation curve-fitting and
steady-state NOE analyses in relax to properly calculate relaxation
data errors from either replicated spectra or the base plane noise
(see the 'rm' command in Sparky or the equivalent in other software).
These peak intensity errors do not linearly map to the relaxation
data, hence why different errors for all data points is rather
important.

Regards,

Edward

On 25 March 2014 18:35, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
> Cheers! I can confirm the problem - the identification of the @N and
> @H atom pairs takes a long, long time. However it does successfully
> complete after a few minutes. There are two problems I can identify.
> Solving the first will be the easiest. From the saved state file that
> you uploaded, I can see that you have loaded all atoms of the PDB file
> into relax as nuclear spins. For your analysis this is not necessary,
> just load the @N and @H atoms. Then the interatom.define user
> function will operate much faster as then it will not need to loop
> over all these 1231 atoms (1231^2 times) but only the 296 @H and @N
> atoms. See http://www.nmr-relax.com/manual/d_Auvergne_protocol_GUI_mode_setting_up_spin.html
> and http://www.nmr-relax.com/manual/GUI_mode_spins_from_structural_data.html
> for more details (the PDF version of the manual at
> http://download.gna.org/relax/manual/relax.pdf is of higher quality).
>
> The second problem is in relax. There must be an inefficiency
> somewhere in the relax code which causes this to take so long. I'll
> look and see if this function can be sped up, but it might require
> modifying the internal PDB reader in relax to automatically determine
> connected atoms for the standard protein/DNA/RNA residues. Fixing
> this is a lot of work, so the first option might be fastest for you.
> As connected atoms in the protein were not automatically detected by
> the relax PDB reader, relax must first loop over each atoms to check
> for connections and for each it must then loop over all other atoms
> and determine if those atoms are within a certain short distance. If
> so, it will consider the atoms to be bonded. Because of these two
> nested loops, for 1231 atoms there would be 1231^2 = 1515361
> interatomic distance checks. This is why it is slow. For just the @N
> and @H spins the number of checks would be ~20 times less.
>
> Regards,
>
> Edward
>
>
> P. S. For any relax developers out there, the fix is to support the
> standard PDB atom naming in the Chemical Component Dictionary, as
> described in http://www.wwpdb.org/documentation/format33/sect9.html#ATOM
> and found at ftp://ftp.wwpdb.org/pub/pdb/data/monomers/. The ATOM
> records in a PDB file must conform to this nomenclature and the given
> CONECT records. All the definitions are in the single file
> ftp://ftp.wwpdb.org/pub/pdb/data/monomers/het_dictionary.txt. For
> example to see glycine, search for "HET GLY". The number of spaces
> is essential here. We could add the standard amino acid HET
> dictionaries to relax and use the CONECT records in these to bond all
> atoms together. Some problems are that in X-ray structures certain
> random atoms will be missing and that encountering non-standard or
> modified amino acids is not uncommon. Therefore the distance-based
> algorithm would be always needed as a fallback if the relax PDB reader
> does not find connected atoms for a given atom.
>
>
>
> On 25 March 2014 16:34, Ivanhoe Leung <ivanhoe.leung@xxxxxxxxxxxxx> wrote:
>> Dear Edward,
>>
>> I have submitted a bug report as requested
>>
>> https://gna.org/bugs/index.php?21862
>>
>> Thanks
>>
>> Ivan
>>
>>
>> ________________________________________
>> From: edward.dauvergne@xxxxxxxxx [edward.dauvergne@xxxxxxxxx] on behalf of Edward d'Auvergne [edward@xxxxxxxxxxxxx]
>> Sent: 25 March 2014 14:43
>> To: Ivanhoe Leung
>> Cc: relax-users@xxxxxxx
>> Subject: Re: Model-free analysis error
>>
>> Hi Ivan,
>>
>> If you could create a bug report using the link
>> https://gna.org/bugs/?func=additem&group=relax, that would be
>> appreciated. Please try to include as much information as possible.
>> The best would be if you could attach a truncated data set, the
>> minimum required to trigger the bug (slightly randomised if you would
>> like to keep it private). If I can reproduce the bug myself, I can
>> normally have a fix for it within 5-10 minutes. Oh, it would be good
>> to check that you are using the latest version of relax - currently
>> 3.1.6 (http://www.nmr-relax.com/download.html) - just in case the bug
>> has already been fixed.
>>
>> Cheers,
>>
>> Edward
>>
>>
>>
>> On 25 March 2014 15:33, Ivanhoe Leung <ivanhoe.leung@xxxxxxxxxxxxx> wrote:
>>> Dear Edward,
>>>
>>> I have now conducted measurements (T1, T2 and NOE) in two seperate fields (600 and 700 MHz) as suggested in your previous email.
>>>
>>> I have upload six different files onto the "Relaxation Data List". The data is in the following format
>>>
>>> 2 ALA 5.49631746729691 N
>>> 3 ASP 3.74279511939516 N
>>> 4 ASP 6.12594952217594 N
>>> 6 SER 6.75812664729337 N
>>>
>>> However, after I click the "Dipolar relaxation" button, the programme freezes up when I press "Apply" or "Next".
>>>
>>> I encountered no problem with the other three buttons (CSA relaxation / X isotope / H isotope)
>>>
>>> I wonder if it is because I am not supplying the right type of data to the software, or if this is a python problem?
>>>
>>> Thanks and I hope to hear back from you soon!
>>>
>>> Ivan
>>>
>>>
>>>
>>> ________________________________________
>>> From: edward.dauvergne@xxxxxxxxx [edward.dauvergne@xxxxxxxxx] on behalf of Edward d'Auvergne [edward@xxxxxxxxxxxxx]
>>> Sent: 10 February 2014 16:09
>>> To: Ivanhoe Leung
>>> Cc: relax-users@xxxxxxx
>>> Subject: Re: Model-free analysis error
>>>
>>> Hi Ivan,
>>>
>>> To continue:
>>>
>>>> On another note, I wonder if it is possible to modify the nmr-relax programme so that I can do model-free analysis with data from only one field strength? Alternatively, do you know of any programme (that can be installed on Windows) that can do such analysis? My work focused mainly on small molecule and ligand-based NMR and I have only just very recently started looking in to protein dynamics so I am still experimentinng different software and data treatment etc.
>>>
>>> Firstly, the subject of single field strength data has been discussed
>>> numerous times on this mailing list. I would recommend you read my
>>> previous responses to questions relating to single field strength
>>> data, and look the other messages in those threads. You will find
>>> these discussions quite informative and highly detailed:
>>>
>>> - Martin Ballaschk:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1409/focus=1438
>>> - Shantanu Bhattacharyya:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1367/focus=1369
>>> - Mengjun Xue:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1276/focus=1277
>>> - Fernando Amador: http://article.gmane.org/gmane.science.nmr.relax.user/84
>>> - Shantanu Bhattacharyya:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1086/focus=1087
>>> - Dhanasekaran Muthu:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1152/focus=1153
>>> - Vitaly Vostrikov:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1147/focus=1150
>>> - Aldino Viegas:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/1127/focus=1128
>>> - Pierre-Yves Savard:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/724/focus=725
>>> - Keith Constantine:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/513/focus=517
>>> - Clare-Louise Evans:
>>> http://thread.gmane.org/gmane.science.nmr.relax.user/326/focus=332
>>> - Hongyan Li: http://thread.gmane.org/gmane.science.nmr.relax.devel/694/focus=701
>>>
>>> These will have lots of additional information. This is just a
>>> selection of possibly the most useful messages.
>>>
>>>
>>> You will soon see that this is a complicated topic. Note that relax
>>> is capable of performing 100% of the functionality of Modelfree4 (with
>>> or without the Fast-Modelfree GUI interface), Dasha, Tensor2, and
>>> DYNAMICS. If you play with the optimisation settings you can even
>>> find identical results to within machine precision - relax can mimic
>>> these other softwares.
>>>
>>> The key is that the full analysis protocol is rather complicated -
>>> many people don't understand this - and that these softwares do not
>>> implement the full iterative protocol. Therefore you either have to
>>> perform it manually or write a script to perform all of the steps.
>>> The protocol is described in the relax manual in figure 7.2
>>> (http://www.nmr-relax.com/manual/diffusion_seeded_paradigm.html). In
>>> summary:
>>>
>>> a) Find an initial diffusion tensor estimate (you can do this in
>>> relax by only using model m0). This requires all non-mobile residues
>>> and side chain spins to be excluded, and this can be problematic. See
>>> the d'Auvergne and Gooley, 2008b paper at
>>> http://dx.doi.org/10.1007/s10858-007-9213-3 for an example of the
>>> catastrophic failure that this initial estimate can result in. Or the
>>> bacteriorhodopsin fragment of Korzhnev et al., 1999
>>> (http://dx.doi.org/10.1023/a:1008356809071) where this complete
>>> failure was earlier demonstrated.
>>>
>>> b) Optimise all of the model-free models from m0 to m9. This
>>> requires high precision optimisation, for a comparison of all the
>>> softwares see the d'Auvergne and Gooley, 2008a model-free optimisation
>>> paper at http://dx.doi.org/10.1007/s10858-007-9214-2. Only relax and
>>> Dasha implement the full range of model-free models, though the models
>>> m6, m7, and m8 cannot be used if only single field strength data is
>>> used (m6 is the original 2-time scale motion model of Clore et al.,
>>> 1990).
>>>
>>> c) Eliminate failed models (this is only available in relax, see the
>>> d'Auvergne and Gooley, 2006 model elimination paper at
>>> http://dx.doi.org/10.1007/s10858-006-9007-z).
>>>
>>> d) Select the best model-free model for each spin system. This again
>>> requires precision modern techniques, with the best being AIC model
>>> select (see the d'Auvergne and Gooley, 2003 model-free model selection
>>> paper at http://dx.doi.org/10.1023/A:1021902006114). If you are
>>> unaware that ANOVA statistics for model selection (hypothesis testing
>>> via chi-squared, F- and t-tests) was abandoned by the field of model
>>> selection over 100 years ago (a field which makes the NMR field look
>>> very, very small), then you should really look at that paper.
>>>
>>> e) Optimise the global model. This is the diffusion tensor plus the
>>> model-free models for all spin systems.
>>>
>>> f) Check for convergence (identical chi-squared values to a previous
>>> iteration, and not necessarily the last one). If no, then go back to
>>> b) and repeat. Note that the chi-squared value can go up
>>> significantly between iterations, but this is because the model is
>>> simplifying itself at a much faster rate by loosing parameters - it's
>>> Occam's razor at work. Again see the d'Auvergne and Gooley, 2008b
>>> paper at http://dx.doi.org/10.1007/s10858-007-9213-3 for figures
>>> demonstrating this. The concept as to what is happening during this
>>> combined model-free optimisation and model selection algorithm is
>>> described in the d'Auvergne and Gooley, 2007 MolBiosyst paper at
>>> http://dx.doi.org/10.1039/b702202f. It can take up to 20 iterations
>>> or more to reach convergence, depending upon the quality of the
>>> relaxation data and the 3D structure or the system in study.
>>>
>>> g) Once steps a-f have been completed for all global models
>>> (characterised by the spheroid, prolate spheroid, oblate spheroid, and
>>> ellipsoid diffusion tensors), then model selection between the
>>> different global models needs to be performed.
>>>
>>> h) Monte Carlo simulations for error analysis must be performed at the end.
>>>
>>> i) Elimination of failed Monte Carlo simulations is essential for
>>> keeping the errors to reasonable values for certain spin systems.
>>> This is also a relax-only feature (see the d'Auvergne and Gooley, 2007
>>> model elimination paper at
>>> http://dx.doi.org/10.1007/s10858-006-9007-z).
>>>
>>> These steps must be implemented independently of which software you
>>> use, as NONE implement the full protocol. Note however that the
>>> protocol I developed (in the d'Auvergne and Gooley, 2007 theory paper
>>> at http://dx.doi.org/10.1039/b702202f and the d'Auvergne and Gooley,
>>> 2008b paper at http://dx.doi.org/10.1007/s10858-007-9213-3) is fully
>>> implemented in relax, however this required multiple field strength
>>> data. This is a rather large script located at
>>> auto_anlayses/dauvergne_protocol.py. This protocol is used by the
>>> GUI. So one option would be to copy this
>>> auto_anlayses/dauvergne_protocol.py script and modify it for the
>>> figure 7.2 protocol.
>>>
>>>
>>> *** Note *** I must warn you about using single field strength data.
>>> It is now quite difficult to publish a model-free analysis with only
>>> single field strength data as most of the field know about the
>>> catastrophic analysis failures resulting in large amounts of
>>> artificial motion. These failures can also be much more subtle. Many
>>> reviewers will ask for such data to be collected as the results cannot
>>> not be trusted otherwise. For a model-free analysis, it is almost
>>> essential to collect data at multiple field strengths, otherwise it
>>> can be sometimes impossible to distinguish between the anisotropic
>>> part of the Brownian tumbling of the molecule and internal motion -
>>> specifically due to the NH vectors in secondary structure elements all
>>> pointing in a similar direction. I have a much better explanation, as
>>> well as citations to all the relevant literature in:
>>>
>>> d'Auvergne E. J., Gooley P. R. (2007). Set theory formulation of the
>>> model-free problem and the diffusion seeded model-free paradigm. Mol.
>>> Biosyst., 3(7), 483-494. (http://dx.doi.org/10.1039/b702202f)
>>>
>>> In this paper, you will see reviewed both the artificial nanosecond
>>> motions of the Schurr 1994 paper and the artifical Rex motions of the
>>> Tjandra 1995 paper.
>>>
>>> Finally, you will probably find it much easier to spend the 7-8 days
>>> collecting data at another field strength than to implement the
>>> protocol of steps a-i in a relax, Modelfree4, or Dasha script (or via
>>> multiple iterations of the GUI programs), as well as study all of the
>>> relevant literature to understand all of the types of failures than
>>> only occurs with single field strength data. With multiple field
>>> strength data you can perform Sebastien Morin's consistency testing
>>> analysis in relax (http://dx.doi.org/10.1007/s10858-009-9381-4 and
>>> http://www.nmr-relax.com/manual/Consistency_testing.html). That way
>>> you can see if your per-experiment temperature calibration and
>>> per-experiment temperature control techniques have works sufficiently
>>> well (http://www.nmr-relax.com/manual/Temperature_control_calibration.html)
>>> and if you have used long enough recycle delays. Collecting data at a
>>> second field would probably save you significant amounts of time, and
>>> has the additional benefit that it would guarantee that the dynamics
>>> you see at the end will be real. I cannot emphasize enough how
>>> important it is to collect data at multiple fields, most importantly
>>> the NOE and R2 data.
>>>
>>> Regards,
>>>
>>> Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Model-free analysis error

Header

Content

Related Messages