Re: relax vs. 'traditional' modelfree -- December 21, 2006

Hi Edward,

Many thanks for taking the time for the detailed and informative response. Enjoy your time off!

Doug


On Dec 21, 2006, at 11:15 AM, Edward d'Auvergne wrote:

That summarises the differences between the use of the
'full_analysis.py' script and Modelfree4 using the FAST-Modelfree
interface quite concisely.  I'll just expand or explain a few of those
points.  There are really four important differences here:  model-free
model selection; model-free model elimination; model-free
optimisation; and the strategy for obtaining the global description of
the Brownian rotational diffusion tensor together with all model-free
models and parameters.
1  Model selection
In the 'full_analysis.py' script AIC model selection is employed.  The
reason for using this criterion is because the global problem is
sought by minimising the Kullback-Leibler discrepancy, more about this
later.  In the FAST-Modelfree interface to Modelfree4, the ANOVA
step-up hypothesis testing of Mandel et al., 1995 is used.  I've shown
in d'Auvergne and Gooley, JBNMR, 2003, 25(1), 25-36 that there are
significant deficiencies in the hypothesis testing model selection.
Specifically there are two flaws:  not selecting a model when one
ought to be selected and under-fitting.  If no model is selected (when
one should be!) then there will be segments of the macromolecule which
cannot be dynamically described (but which should be).  The
consequences of under-fitting are that S2 is overestimated and te and
Rex parameters are underestimated by being dropped from the final
model.  These two flaws cause the molecule to appear more rigid than
reality.  This is what you are seeing Doug with the higher proportion
of models m3 to m5.
2  Model elimination
This may or may not be causing differences between the results.
Essentially if a model-free model has failed, the 'full_analysis.py'
script will kick it out prior to model selection.  See d'Auvergne and
Gooley, JBNMR, 2006, 35(2), 117-135 for more details.
3  Optimisation
This point will make a major difference to the results.  For the
optimisation of the model-free models (ignore the optimisation of the
diffusion tensor for now) there are 4 optimisation issues:
optimisation precision; failure of the Levenberg-Marquardt
minimisation (in relax, Modelfree4, and Dasha); failure of the limits
algorithm; and a bug in Modelfree4.  I have a paper in submission
which fully explores each of these issues.
The difference in the precision of optimisation between the default in relax and those values hard-coded into model-free is 20 orders of magnitude! For more details see the archived post located at https://mail.gna.org/public/relax-devel/2006-10/msg00122.html (Message-id: <7f080ed10610200333sba40cb8qe6f9e025185bedfe@xxxxxxxxxxxxxx>). There are a few more details inter-dispersed in the thread to which that post belongs starting at https://mail.gna.org/public/relax-devel/2006-10/msg00114.html (Message-id: <7f080ed10610190804w5681fafav843718f50f985f40@xxxxxxxxxxxxxx>). relax can easily be set to the low precision of Modelfree4 however I wouldn't recommend it as the convolution of the model-free space will mean that early termination of optimisation due to low precision will result in parameter values far from the true values.
The Levenberg-Marquart algorithm which is the only optimisation
algorithm in Modelfree4, one of two in Dasha, or one of many in relax
is also an issue.  The problem is described in the fine print of the
algorithm - the singular matrix failure of the Levenberg-Marquardt
matrix.  This is often described as being rarely encountered.  Yet in
model-free analysis the singular matrix failure is actually quite
common.  It occurs when ever an internal correlation time parameter
becomes undefined - i.e. when the corresponding order parameter is
equal to one.  In this case changing the correlation time has no
effect.  There are two things which amplify the issue, both the grid
search and the limits algorithm significantly increase the probability
of having an S2 value of 1.  This issue is a hidden issue as those
models in which the Levenberg-Marquardt algorithm has failed are often
not selected by the model selection algorithm as their optimised
chi-squared value is overestimated.
The limits algorithm used in Modelfree4 is another point of failure.
This can be pictured as follows (taken from a submitted paper).  Say
minimisation is constrained within a cube arbitrarily placed within a
space. Let there be a single minimum located towards one face of the
cube. It is simultaneously a local and global minimum within the cube.
If the minimum is much narrower than the length between points of the
grid search it is conceivable that a moderate curvature of the space
will cause the grid search algorithm to select a position distant from
the minimum. This often occurs within the model-free space because of
the shallow, curved valley which starts at infinite correlation times
and heads down to the minimum. Assuming only one minimum within the
entire space, optimisation without constraints will follow a
trajectory determined by the curvature of the space from the initial
position to the minimum. If the trajectory is contained within the
cube, constraints should not influence optimisation. However if part
of the trajectory lies outside the cube the constraint algorithm will
influence whether the minimum will be found. Where the trajectory
traverses the surface of the cube if, between the exit and reentry
points, there is a downhill path where the gradient is always
negative, then this path should be followed to allow the minimum to be
found. The constrained trajectory should be similar to the
unconstrained trajectory for those parts within the cube. The parts
outside the cube should be replaced by a trajectory along the face of
the cube between the exit and entry points. Within the model-free
space this hypothetical situation does occur due to the convoluted
nature of the space.  However Modelfree4 does not follow the downhill
path along the constraint and optimisation is terminated far from the
minimum.
The last difference is caused by a bug in the Modelfree4
Levenberg-Marquardt algorithm whereby optimisation is terminated
early.  In a paper that has been submitted, I've shown that between 13
to 45% of residues or spin systems are affected by this issue
dependant on the model-free model.
4  Optimisation of the global model
This one is quite complex and is in another manuscript I have
submitted for publication.  Essentially in Modelfree4 using the
FAST-Modelfree interface you are forced to follow the paradigm of
starting the analysis using an initial estimate of the diffusion
tensor first used in Kay et al., Biochem, 1989, 28(23), 8972-8979.
Using this estimate you then optimise the model-free models.
The 'full_analysis.py' script takes a completely different approach to solving the simultaneous optimisation and model selection global problem (the diffusion tensor + all model-free models for all spin systems). For details, see the post at https://mail.gna.org/public/relax-users/2006-10/msg00009.html (Message-id: <7f080ed10610041011o60a666d8maf317714ef1dec01@xxxxxxxxxxxxxx>) and all the other messages following from Sebastien Morin's post at https://mail.gna.org/public/relax-users/2006-10/msg00007.html (Message-id: <4523D86D.8060005@xxxxxxxxx>).

I hope that that sufficiently describes the differences in the results!
Cheers,
Edward
On 12/21/06, Chris MacRaild <c.a.macraild@xxxxxxxxxxx> wrote:
Hi Doug,
I've done similar comparisons and come to similar results.
There are a few things to keep in mind when trying to rationalise these differences. First, the approach coded in full_analysis.py makes a serious attempt to optimise both the rotational diffusion tensor, as well as the local dynamic parameters. Modelfree, on the other hand, relies on you having a good estimate of the tensor before you start. So the first thing to check is whether the diffusion tensor relax gets agrees with the one you gave Modelfree - if not, all bets are off with respect to the dynamic parameters. Second, the model selection used by relax is different to that used by Modelfree, so relax will in some cases pick different models, even with everything else being equal. Edward can elaborate on why the relax approach is superior, I'm sure... Third, the optimisation code in relax is much more up-to-date, so is better at finding the true best fit for any given model to your data. Finally, its worth keeping in mind that in many cases, dynamic parameters are poorly defined, even by good data. Even very big differences in tau_e, eg. are not always significant.

The difference that would concern me is if there are dramatic differences in order parameters - S2 is generally fairly robust to the above issues, within reason.
Cheers,
Chris
On Wed, 2006-12-20 at 16:18 -0500, Douglas Kojetin wrote: > Hi All, > > Has anyone compared runs of relax (m1 through m5; full_analysis.py > script) vs. a traditional fastmodelfree/modelfree run using the > binary provided by the Palmer group? I have ... I think I'm using > similar parameters for both runs, and I'm seeing a drastic difference > in results (models chosen). > > Thanks in advance for the input, > Doug > > _______________________________________________ > relax (http://nmr-relax.com) > > This is the relax-users mailing list > relax-users@xxxxxxx > > To unsubscribe from this list, get a password > reminder, or change your subscription options, > visit the list information page at > https://mail.gna.org/listinfo/relax-users >
_______________________________________________
relax (http://nmr-relax.com)
This is the relax-users mailing list
relax-users@xxxxxxx
To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: relax vs. 'traditional' modelfree

Header

Content

Related Messages