Re: Model-free analysis error -- March 03, 2014

Hi Troels,

Thank you for the Wiki page, it'll come in handy for future reference!

To implement and properly test a protocol, you should probably plan
for 1-2 months worth of solid work.  The protocol in figure 7.2 would
be the best to implement.  The original of the Mandel, Akke and Palmer
1995 paper would take much longer as chi-squared and F-tests would
need to be implemented in relax.  This can be rather complicated as
the F-test distribution must be simulated, in a similar way as the
Monte Carlo simulations, and this takes a lot of computation time.
Also, the model selection used in that Mandel et al. paper should
rather be avoided.  See my model-free model selection paper at
http://dx.doi.org/10.1023/A:1021902006114 for why this is no good.  In
summary, hypothesis test or ANOVA statistics for choosing between
competing mathematical models was abandoned by the field in the 1930's
and many more advanced techniques have been developed since (note that
the field of mathematical model selection makes the field of NMR look
small).  ANOVA statistics for model selection are easy to manipulate
to obtain the result you want, simply change between step-up,
step-down, step-wise techniques, create a complex flow diagram, and
change the alpha critical levels to suit.  This is actually the origin
of the expression that you can prove anything with statistics, and is
the reason ANOVA was abandoned for model selection.

For the script itself, it can located in the auto_analyses directory.
But it should have a different name, as it is a different protocol.
There are a number of protocols in the field, for example see some of
the Brüschweiler papers.  So the protocol should be named after the
author.  The protocol in figure 7.2 is not published yet, well apart
from in my PhD thesis
(http://eprints.infodiv.unimelb.edu.au/archive/00002799/), but that
may change one day soon.  You could start with the
dauvergne_protocol.py script, copy it to something like
'diffusion_seeded_protocol.py', and then make modifications.  Copying
the script would be a good idea as most parts are the same.  Note that
this protocol is not specific to single field strength data, but you
would need to limit the model-free models used based on the amount of
data loaded.

You will need to implement the starting step from scratch - the
initial diffusion tensor estimate.  This can be performed in relax.
Actually it must be performed in relax!  The problem is that all the
different model-free software or diffusion tensor estimating software
use different Euler angle and spherical angle conventions.  In
Modelfree4, Dasha, and Tensor, these are not documented enough to
allow the values to be taken from one program into another.  There are
2304 Euler angle sets for diffusion tensors (see
http://article.gmane.org/gmane.science.nmr.relax.user/1383), so
converting between each without all the information is impossible.
Anyway, this can be performed if the user gives a subset of spins
which are supposed to be rigid, and then the m0 model can be used to
optimise the diffusion tensor.  I've talked about this before, for
example see 
http://thread.gmane.org/gmane.science.nmr.relax.user/1152/focus=1153.
 More can be found on the list.  That post is very useful as it goes
into some of the details of implementing the protocol.

Note that I probably would not approve of implementing the protocol
into the GUI.  Or to have a sample script available to users.  It is
extremely bad practice to only use data at a single field.  See my
review at http://dx.doi.org/10.1039/b702202f to understand why, as
well as Dmitry Korzhnev's review that I point to
(http://dx.doi.org/10.1016/S0079-6565(00)00028-5).  I am ok for having
the protocol implemented and sitting there for power users to have
access too, but it should not be advertised to normal users.  Single
field strength analyses are a set back and a detriment for the field.
NMR has an advantage over X-ray crystallography as we can study
motions - but the inherent failures and resultant artificial motions,
which can not be distinguished from reality, is a blight for the whole
field of NMR.

Regards,

Edward



On 3 March 2014 10:31, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:

Dear Edward.

Thank you for your lengthy explanation on this subject.

I have put it at the wiki, as a lookup reference:
http://wiki.nmr-relax.com/Model-free_analysis_single_field

I have though been met with a re-quest to try to analyse some old data, only
recorded at one field.

If I should try to copy and modify auto_anlayses/dauvergne_protocol.py
script,
would you accept that copy to reside inside the trunk or would the best way
to keep such a copy
local and not-part of the relax source repository ?

Best
Troels


2014-02-10 17:09 GMT+01:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:


Hi Ivan,

To continue:

On another note, I wonder if it is possible to modify the nmr-relax
programme so that I can do model-free analysis with data from only one 
field
strength? Alternatively, do you know of any programme (that can be 
installed
on Windows) that can do such analysis? My work focused mainly on small
molecule and ligand-based NMR and I have only just very recently started
looking in to protein dynamics so I am still experimentinng different
software and data treatment etc.


Firstly, the subject of single field strength data has been discussed
numerous times on this mailing list.  I would recommend you read my
previous responses to questions relating to single field strength
data, and look the other messages in those threads.  You will find
these discussions quite informative and highly detailed:

- Martin Ballaschk:
http://thread.gmane.org/gmane.science.nmr.relax.user/1409/focus=1438
- Shantanu Bhattacharyya:
http://thread.gmane.org/gmane.science.nmr.relax.user/1367/focus=1369
- Mengjun Xue:
http://thread.gmane.org/gmane.science.nmr.relax.user/1276/focus=1277
- Fernando Amador:
http://article.gmane.org/gmane.science.nmr.relax.user/84
- Shantanu Bhattacharyya:
http://thread.gmane.org/gmane.science.nmr.relax.user/1086/focus=1087
- Dhanasekaran Muthu:
http://thread.gmane.org/gmane.science.nmr.relax.user/1152/focus=1153
- Vitaly Vostrikov:
http://thread.gmane.org/gmane.science.nmr.relax.user/1147/focus=1150
- Aldino Viegas:
http://thread.gmane.org/gmane.science.nmr.relax.user/1127/focus=1128
- Pierre-Yves Savard:
http://thread.gmane.org/gmane.science.nmr.relax.user/724/focus=725
- Keith Constantine:
http://thread.gmane.org/gmane.science.nmr.relax.user/513/focus=517
- Clare-Louise Evans:
http://thread.gmane.org/gmane.science.nmr.relax.user/326/focus=332
- Hongyan Li:
http://thread.gmane.org/gmane.science.nmr.relax.devel/694/focus=701

These will have lots of additional information.  This is just a
selection of possibly the most useful messages.


You will soon see that this is a complicated topic.  Note that relax
is capable of performing 100% of the functionality of Modelfree4 (with
or without the Fast-Modelfree GUI interface), Dasha, Tensor2, and
DYNAMICS.  If you play with the optimisation settings you can even
find identical results to within machine precision - relax can mimic
these other softwares.

The key is that the full analysis protocol is rather complicated -
many people don't understand this - and that these softwares do not
implement the full iterative protocol.  Therefore you either have to
perform it manually or write a script to perform all of the steps.
The protocol is described in the relax manual in figure 7.2
(http://www.nmr-relax.com/manual/diffusion_seeded_paradigm.html).  In
summary:

a)  Find an initial diffusion tensor estimate (you can do this in
relax by only using model m0).  This requires all non-mobile residues
and side chain spins to be excluded, and this can be problematic.  See
the d'Auvergne and Gooley, 2008b paper at
http://dx.doi.org/10.1007/s10858-007-9213-3 for an example of the
catastrophic failure that this initial estimate can result in.  Or the
bacteriorhodopsin fragment of Korzhnev et al., 1999
(http://dx.doi.org/10.1023/a:1008356809071) where this complete
failure was earlier demonstrated.

b)  Optimise all of the model-free models from m0 to m9.  This
requires high precision optimisation, for a comparison of all the
softwares see the d'Auvergne and Gooley, 2008a model-free optimisation
paper at http://dx.doi.org/10.1007/s10858-007-9214-2.  Only relax and
Dasha implement the full range of model-free models, though the models
m6, m7, and m8 cannot be used if only single field strength data is
used (m6 is the original 2-time scale motion model of Clore et al.,
1990).

c)  Eliminate failed models (this is only available in relax, see the
d'Auvergne and Gooley, 2006 model elimination paper at
http://dx.doi.org/10.1007/s10858-006-9007-z).

d)  Select the best model-free model for each spin system.  This again
requires precision modern techniques, with the best being AIC model
select (see the d'Auvergne and Gooley, 2003 model-free model selection
paper at http://dx.doi.org/10.1023/A:1021902006114).  If you are
unaware that ANOVA statistics for model selection (hypothesis testing
via chi-squared, F- and t-tests) was abandoned by the field of model
selection over 100 years ago (a field which makes the NMR field look
very, very small), then you should really look at that paper.

e)  Optimise the global model.  This is the diffusion tensor plus the
model-free models for all spin systems.

f)  Check for convergence (identical chi-squared values to a previous
iteration, and not necessarily the last one).  If no, then go back to
b) and repeat.  Note that the chi-squared value can go up
significantly between iterations, but this is because the model is
simplifying itself at a much faster rate by loosing parameters - it's
Occam's razor at work.  Again see the d'Auvergne and Gooley, 2008b
paper at http://dx.doi.org/10.1007/s10858-007-9213-3 for figures
demonstrating this.  The concept as to what is happening during this
combined model-free optimisation and model selection algorithm is
described in the d'Auvergne and Gooley, 2007 MolBiosyst paper at
http://dx.doi.org/10.1039/b702202f.  It can take up to 20 iterations
or more to reach convergence, depending upon the quality of the
relaxation data and the 3D structure or the system in study.

g)  Once steps a-f have been completed for all global models
(characterised by the spheroid, prolate spheroid, oblate spheroid, and
ellipsoid diffusion tensors), then model selection between the
different global models needs to be performed.

h)  Monte Carlo simulations for error analysis must be performed at the
end.

i)  Elimination of failed Monte Carlo simulations is essential for
keeping the errors to reasonable values for certain spin systems.
This is also a relax-only feature (see the d'Auvergne and Gooley, 2007
model elimination paper at
http://dx.doi.org/10.1007/s10858-006-9007-z).

These steps must be implemented independently of which software you
use, as NONE implement the full protocol.  Note however that the
protocol I developed (in the d'Auvergne and Gooley, 2007 theory paper
at http://dx.doi.org/10.1039/b702202f and the d'Auvergne and Gooley,
2008b paper at http://dx.doi.org/10.1007/s10858-007-9213-3) is fully
implemented in relax, however this required multiple field strength
data.  This is a rather large script located at
auto_anlayses/dauvergne_protocol.py.  This protocol is used by the
GUI.  So one option would be to copy this
auto_anlayses/dauvergne_protocol.py script and modify it for the
figure 7.2 protocol.


*** Note *** I must warn you about using single field strength data.
It is now quite difficult to publish a model-free analysis with only
single field strength data as most of the field know about the
catastrophic analysis failures resulting in large amounts of
artificial motion.  These failures can also be much more subtle.  Many
reviewers will ask for such data to be collected as the results cannot
not be trusted otherwise.  For a model-free analysis, it is almost
essential to collect data at multiple field strengths, otherwise it
can be sometimes impossible to distinguish between the anisotropic
part of the Brownian tumbling of the molecule and internal motion -
specifically due to the NH vectors in secondary structure elements all
pointing in a similar direction.  I have a much better explanation, as
well as citations to all the relevant literature in:

d'Auvergne E. J., Gooley P. R. (2007). Set theory formulation of the
model-free problem and the diffusion seeded model-free paradigm. Mol.
Biosyst., 3(7), 483-494. (http://dx.doi.org/10.1039/b702202f)

In this paper, you will see reviewed both the artificial nanosecond
motions of the Schurr 1994 paper and the artifical Rex motions of the
Tjandra 1995 paper.

Finally, you will probably find it much easier to spend the 7-8 days
collecting data at another field strength than to implement the
protocol of steps a-i in a relax, Modelfree4, or Dasha script (or via
multiple iterations of the GUI programs), as well as study all of the
relevant literature to understand all of the types of failures than
only occurs with single field strength data.  With multiple field
strength data you can perform Sebastien Morin's consistency testing
analysis in relax (http://dx.doi.org/10.1007/s10858-009-9381-4 and
http://www.nmr-relax.com/manual/Consistency_testing.html).  That way
you can see if your per-experiment temperature calibration and
per-experiment temperature control techniques have works sufficiently
well
(http://www.nmr-relax.com/manual/Temperature_control_calibration.html)
and if you have used long enough recycle delays.  Collecting data at a
second field would probably save you significant amounts of time, and
has the additional benefit that it would guarantee that the dynamics
you see at the end will be real.  I cannot emphasize enough how
important it is to collect data at multiple fields, most importantly
the NOE and R2 data.

Regards,

Edward

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Model-free analysis error

Header

Content

Related Messages