Re: Model selection and local_tm -- April 03, 2008

On Wed, Apr 2, 2008 at 4:23 PM, Carl Diehl <Carl.Diehl@xxxxxxxxx> wrote:

 On Mon, 2008-03-31 at 11:28, Edward d'Auvergne wrote:
 > On Fri, Mar 28, 2008 at 2:59 PM, Carl Diehl <Carl.Diehl@xxxxxxxxx> wrote:
 > > Hi.
 > >  I used the full_analysis.py script for testing and evaluating relax in
 > >  comparison with published data.
 > >
 > >  The system was calcium-loaded Calbindin D9k, with R1 & NOE at 600 MHz
 > >  and R2, R1 & NOE at 500 MHz. The relaxation data is of very high
 > >  quality. The model selection was done using home-written software, so 
no
 > >  ordinary model selection à Modelfree.
 >
 > What do you mean by model selection?  Did you use a different
 > technique from the statistical field of mathematical modelling and
 > model selection?  Did you use a frequentist method, a Bayesian method,
 > or hypothesis testing methods (the last of which is described in
 > textbooks from this field of knowledge as being very, very bad)?  Did
 > you use this to select between model-free models, diffusion models, or
 > the combined global model (model-free + diffusion)?
 >
 >
 To clarify, in the example above I've only used published relaxation
 values for D9k (Johan Kördel et al, Biochemistry 1992, 31, 4856-4866).
 This is a pretty old article and I'm mainly using the data for comparing
 and validating relax vs Modelfree. In other words none of the original
 model selection was done by me.
 The diffusion model was isotropic.
 The model selection was done by first optimising the global sum of
 squares vs model m2.
 Residues with a R2/R1 ratio below the average value were optimised with
 model m5.


Ah, that could be why the local_tm model is being selected.  Because
you have followed the construction of the isotropic model as per the
paper, this model is quite possibly much worse than the local_tm
model.  Well, that's precisely what the model selection is saying.

 > >  After running full_analysis.py (removing excess models),
 >
 > By 'removing excess models', do you mean eliminating failed models?
 > What is an excess of models?
 Models with more parameters than the number of experimental values.
 MF_MODELS = ['m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7', 'm8', 'm9']
 LOCAL_TM_MODELS = ['tm0', 'tm1', 'tm2', 'tm3', 'tm4', 'tm5', 'tm6',
 'tm7', 'tm9']


This makes sense, over-fitting must be avoided.

 > > model selection
 > > gives me local_tm as the best model for the diffusion tensor.
 >Previous
 > > calculations (both 15N and 13C relaxation data) indicates that the
 > > diffusion tensor is to a good approximation, isotropic (only a very
 > >  slight anistropy).

 > What does 'good', 'slight', etc. really mean?  For a mathematical
 > modelling perspective, I don't understand these.  Is a Da of
 > 1.001+/-0.001 slight, or 1.2+/-0.01, or 1.6+/-0.3?  From my
 > experience, my opinion is that unless all the bond vectors point in
 > the same direction, isotropy will never be statistically significant
 > over the spheroids and ellipsoids.
 Dpar/Dper = 1.08 +/- 0.001. The axially symmetric diffusion model is
 statistically significant compared to the isotropic model as you say 
(after digging up the paper).
 An anistropy of 1.08 gives only a small effect on the relaxation rates
 (R1 +/- 0.05, ~R2 +/- ~0.25, NOE +/- ~0.01), an isotropic diffusion
 tensor is therefore a good approximation for most calculations.


Although the effect on the rates is small, the effects on the
model-free model parameters can be very significant.  That's because
relaxation is dominated by the global rotational tumbling of the
molecule whereas the internal motions only exhibit a small effect.
That is why the diffusion tensor has to be precisely determined,
otherwise artificial motions appear (Schurr 94, Tjandra 96).  This, of
course, depends on the bond vector orientation relative to the
diffusion frame.

 > >  Looking at the local_tm values for the secondary structure, most of 
them
 > >  have a local_tm which is similar to the isotropic tensor.
 >
 > If you used AIC or BIC model selection between the two models, what
 > are the chi-squared values, criteria values, and parameter numbers for
 > each model?  In the test, did you compare the local_tm model to the
 > isotropic model?  Or did you compare the local_tm model to the
 > isotropic, 2 spheroids, and ellipsoid simultaneously?  Again, what is
 > the qualifier most?  And how do the non-conforming residues not
 > conform?
 >
 I used the full_analysis.py script and ran all diffusion models until
 convergence according to protocol within the script.
 I will have to get back to you with the exact values of the model
 selection.


Did you run all diffusion models?  Was full_analysis.py modified in any way?

 > >  Is the local_tm model always correct? For a well folded protein, one
 > >  would expect that the local_tm model should be invalid?
 >
 > As this is mathematical modelling, there is no such thing as an
 > invalid model.  By definition of the term model, a model is an
 > approximation of something far more complex.  Therefore there is only
 > a grey scale of how good a model approximates reality (of course we
 > can never know what is reality).  For a folded, single domain,
 > globular protein (with no significant, floppy loops), then the sphere,
 > spheroids, or ellipsoid should be a good description and the local_tm
 > model will not be selected.  Could you reproduce the model selection
 > statistics for all 4?  If the local_tm model is chosen, then it is an
 > indication that something is not normal - quite possibly interesting
 > dynamics.  Unfortunately, there's not enough information in your post
 > for me to tell you exactly what happened.  One point that concerns me
 > is that you only have an R2 measurement at a single field strength.
 > Note that to differentiate between chemical exchange effects (which
 > are scaled quadratically with field strength) and internal nanosecond
 > motions (constant at different fields) and anisotropic tumbling of the
 > molecule (again constant), you really need the R2 collected at 2
 > fields.  But this may not necessarily be the reason you're seeing the
 > local_tm model being selected.  I'm sorry that I am not yet able to
 > give you a clear answer.
 >
 Chemical exchange is not an issue for D9k. D9k has floppy ends and a
 loop in the middle, the rest of the sequence is rigid (S2 ~ 0.8-0.9).
 Instead of measuring R2 at two fields, one option is to measure 1H-15N
 dipolar/CSA transverse and longitudinal crossrelaxation and extract R2
 without chemical exchange. The Kay group also published an article
 recently which allows measurements of R2 without chemical exchange.

 BTW, is it possible to incorporate the above types of relaxation rates
 as another experimental value to fit against.


Measuring R2 without chemical exchange is a very useful thing, as it
simplifies the model-free problem considerably (the number of
universes encompasing the entire problem is nevertheless still huge).
 The two cross-correlated relaxation rates (interference) would also
be very useful but, as of yet, this has not been incorporated into
relax.  This has been discussed before a number of times.  For example
see the tread starting at
https://mail.gna.org/public/relax-devel/2006-10/msg00063.html.  I do
plan to eventually add these rates to the 1.3 relax development line
(but not the 1.2 stable line).

Cheers,

Edward

Re: Model selection and local_tm

Header

Content

Related Messages