Re: Full analysis issue -- May 09, 2008

Hi Ed,

Fine !

That'll give me time to think a bit more about all that...

Cheers !


Séb  :)



Edward d'Auvergne wrote:

Hi,

The help wasn't a problem, and don't worry about the length of the
message - the details will be very useful.  Unfortunately you may need
to wait a little while as I'll be taking holidays for a week or so,
starting in 5 minutes time.  I'll have to look at the details after
that.

Regards,

Edward



On Fri, May 9, 2008 at 5:45 PM, Sébastien Morin
<sebastien.morin.1@xxxxxxxxx> wrote:

Hi Ed,

First, thanks a lot for this help !

Second, I have to apologize for the length of this mail...


Ok...


My system is a 271 residue globular protein (230 residues with data at 3
fields = 2070 observables). An homologous protein is being studied in
the lab and analysing relaxation data using either the diffusion seeded
approach in ModelFree or the new protocol of the full_analysis script
yields similar results with a high mean S2 (~0.90) and a few Rex (15-20)
throughout the protein. Thus, the problem here with my system is
probably external to the approaches and the user...


Ok...


I tried using ModelFree with relax (script palmer.py : ModelFree as an
engine for optimization, but relax for automating and AIC model
selection) and got similar results than with the full_analysis.py
approach... For the two situations tested (see below), no oscillation
occured. Here are some stats :

=======================================================================
Approach        Diff     Iter  Chi2    AIC     Nb_Rex  <Rex>_+-_StdDev
==============  =======  ====  ======  ======  ======  ===============
palmer          prolate  15    ~12990  ~14060  182     1.602_+-_0.770

palmer_hybrid   prolate  12    ~ 2715  ~ 3660  129     0.902_+-_0.571

full            prolate   5    ~13090  ~14125  181     1.671_+-_0.782

full_hybrid     prolate   7    ~ 2750  ~ 3720  145     2.431_+-_1.546
=======================================================================

It seems that the new protocol is not the source of the problem.
Moreover, it is obvious from the AIC value (and also from the diffusion
tensor details, not shown here) that the hybrid (without the highly
flexible C-terminus) is a better description of the system. However, as
is seen here, the Rex values seem quite small and there are way too much
Rex (> 50 % of all residues)... These may thus be non significative, but
then, how can one exclude such "artifacts" when doing iterative
optimization (with either approach)..? How can one decide to choose
another model than with Rex when iterating to find the best diffusion
tensor..?


Ok...


Maybe, as you proposed, the problem arises because of the crystal
structure being inappropriate for describing the solution structure...
The crystal structure I use has a resolution of 1.95 A. Protons were not
visible but were added using CHARMM.  Moreover, different snapshots from
molecular mechanics in CHARMM were also tested to see if fluctuations in
NH bond orientation could yield better optimizations... It was not the 
case.

I'll try to assess this issue of the crystal structure by running tests
(with palmer.py and also full_analysis.py approaches) using a different
structure (a ponctual mutant) also from crystallography... The
resolution of this structure is also quite low (1.75 A). Anyway, I don't
have choice since no solution structure exists, neither better crystal
structures... If ever the crystal structure is the cause of this
problem, what can one do ? Is one obliged to do his analysis with a
local_tm or a sphere diffusion tensor ? Is it a waste if on does so with
good quality data at three fields ???


Ok...


What about the AIC for the local_tm model VS the ellipsoid in the
full_analysis approach ? Here are some stats :

=======================================================================
Approach     Models  Diff       AIC
===========  ======  =========  ======
full         m1-m5   local_tm   ~ 4510
full         m1-m5   ellipsoid  ~12710

full         m0-m9   local_tm   ~ 4410
full         m0-m9   ellipsoid  ~ 5210

full_hybrid  m1-m5   local_tm   ~ 4510
full_hybrid  m1-m5   ellipsoid  ~ 4720 *

full_hybrid  m0-m9   local_tm   ~ 4410
full_hybrid  m0-m9   ellipsoid  ~ 4570 **
=======================================================================
*  not converged after 35 rounds (oscillates)
** not converged after 26 rounds (oscillates)

As said before, the hybrid improves the description of the diffusion,
however, there is still a problem : first, the local_tm diffusion is
still selected over the ellipsoid (even if the difference is now
smaller), second, the ellipsoid optimizations don't converge and
oscillate...

Now, what about the Rex and slow motions (ts) in the local_tm diffusion
? Here are some stats :

=======================================================================
Approach     Models  Diff       Nb_Rex  Nb_ts
===========  ======  =========  ======  =====
full         m1-m5   local_tm    58      30
full         m1-m5   ellipsoid  171      21

full         m0-m9   local_tm    63      41
full         m0-m9   ellipsoid  144      49

full_hybrid  m1-m5   local_tm    58      30
full_hybrid  m1-m5   ellipsoid  142 *    28

full_hybrid  m0-m9   local_tm    64      41
full_hybrid  m0-m9   ellipsoid  145 **   50
=======================================================================
*  not converged after 35 rounds (oscillates)
** not converged after 26 rounds (oscillates)

As you can see, there are way more Rex in the ellipsoid, which probably
means that there is a problem with the diffusion tensor... For the slow
ns motions, there doesn't seem to be significantly more in the ellipsoid
description... Moreover, the sphere diffusion tensor which is not
NH-vector-orientation-dependent, also as a high degree of Rex, similar
ns motions and AIC values similar (just a bit higher) to what is
observed for the ellipsoid :

=======================================================================
Approach     Models  Diff       Nb_Rex  Nb_ts  AIC
===========  ======  =========  ======  =====  ======
full         m1-m5   sphere     191      20    ~15200

full         m0-m9   sphere     155      47    ~ 5640

full_hybrid  m1-m5   sphere     145      31    ~ 5190

full_hybrid  m0-m9   sphere     153      47    ~ 5030
=======================================================================

Should the sphere diffusion tensor yield similar results as the local_tm
? If there is a major difference between those two, does it mean that
concerted motions may be present and that an hybrid model could solve
the issue ?


Ok...


Now, are there concerted motions apparent from the local_tm results..? I
plotted the results from the local_tm run after aic model selection
(Would it be better if I'd look at the local_tm run for model 1 or 2
only ? Can model selection here bias the results ?) and couldn't find
any obvious link between different parts of the protein for one or more
parameters among S2, S2f, S2s, Rex, te, tf, ts, chi2.

However, a small relation seems to exist for the local_tm distribution
and the domain (The inverse is seen for the S2, but to a lesser extent.
When looking at the tm1 run, the local_tm is also a bit smaller in the
same domain [a small difference of 0.5-1.0 ns for values of ~13 ns], but
the S2 are similar, which points to a difference for the two domains).

My protein is globular, but has two structural domains side by side, an
all alpha domain and an alpha/beta domain. In the homologous protein,
there seems to exist Rex at the interface (which spans a surface of four
10 residue beta strands, which is big and is expected to be quite
rigid). Maybe the two domains are a bit different in my system which
could cause the problems I encounter. I'll try to assess this by running
full_analysis runs on the different domains alone...


Ok...


Well, I'm out of idea now... If you have any idea that could help, these
will be more than welcome !

I hope this discussion can also help other people solving difficulties
encountered in their analysis or help them get more information out of
their system...

Thanks a lot once more !

Cheers !



Sébastien


P.S. Again, sorry for the length of the mail...











Edward d'Auvergne wrote:

Hi,

I've been thinking about this one for a while, but I don't know
exactly what the problem is.  I have a few ideas that may help though.
 This could either be some type of interesting dynamics, or be caused
by something a bit more sobering.

Firstly though, it is worth comparing the local tm model to the best
of the global diffusion tensor models (the ellipsoid).  It could be
that if the AIC values are similar, then the local tm model and the
global diffusion model are statistically similar and that it would be
safe to go with either.  In this case, it is worth very carefully
comparing the description of the internal dynamics.  For this, do not
compare selected models - that is not what is of interest.  It should
be the overall picture of the dynamics reported by the parameters.
For example if Rex is statistically close to zero then, from the
perspective of the internal motions, models m2 and m4 are the same.

Assuming that the local tm global model is significantly better than
the other models, another option could be that you have very
interesting global concerted dynamics occurring in the molecule.  This
would mean that the standard single global diffusion model (sphere,
spheroid, or ellipsoid) is insufficient to describe these motions.
This is what the hybrid models in relax were designed for, but maybe
these don't describe certain large scale motions well enough (hence
your use of these didn't resolve the problem).  These aren't a proper
mathematical solution to the complex physics of coupled diffusion
processes and hence may be insufficient.

It might be worth trying the normal model-free analysis of starting
with the diffusion tensor, rather than my new technique which starts
with the internal dynamics, to see if you end up with a different
result.  It could be that the new technique in the full_analysis.py
script is somehow failing, although I doubt that will be the case.
The oscillation you see in point 3 is found by using Art Palmer's
Modelfree program as well with a standard analysis - this was one of
the motivators for me to start looking into and fixing problems with
model-free analysis - but it is inherent to the iterative procedure
required for convergence.  Have you tried the analysis with Modelfree
or Dasha?  And if so, how do the chi-squared and AIC values compare?

Alternatively, the reason could be quite simple.  It could possibly be
that the structure you have used in the analysis is not accurate
enough.  If it is a crystal structure, maybe it doesn't represent the
solution structure well.  The analysis is highly dependent upon the XH
bond vector orientations, and if this is slightly out it could cause a
bias and the introduction of artificial motions (either Rex or
nanosecond motions).  It will also affect the determination of the
diffusion tensor.  These artificial motions are unlikely to be present
in the local tm model though, so this is a good check.

The Rex in the ellipsoid model is an indication that something could
be wrong with the global model.  Whether it is interesting large scale
motions which are insufficiently described by the ellipsoid, whether
the technique cannot find the real solution, or whether this is caused
by structural inaccuracies, that I cannot tell.  Is the structure of
the protein released?  What is the system which is being studied?
What are the AIC values like for each global model?  Anyway, hopefully
one of these ideas may be of help in sorting out the problem.

Regards,

Edward





On Mon, May 5, 2008 at 9:23 PM, Sébastien Morin
<sebastien.morin.1@xxxxxxxxx> wrote:

Hi,

 I am currently using relax with the full_analysis.py script.

 I face several problems for which I can't find any solution...

 1.
 With all my data (230 residues at 3 fields, for a total of 2070
 observables), the best diffusion model is the local tm. This is not
 normal as this protein is globular. Hence, the C-terminus residues have
 really high chi2 values... Thus, when excluding the C-terminus, the best
 diffusion model is still the local tm. Maybe some other residues are
 highly flexible and should be rejected... Maybe also some residues have
 bad data... What is a good strategy to find residues I should exclude
 from my analysis ?


 2.
 When I look at optimized results from the ellipsoid runs (second best
 choice after local tm), I see lots (~ 50 % residues) of Rex, which is a
 bit anoying... The diffusion tensor may not be well optimized... This
 may be related to problem 1...


 3.
 In different situations, some runs (prolate or ellipsoid, i.e. the
 diffusion tensor that should best describe my system) never converge and
 oscillate between 2 or more AIC values. Some residues oscillate between
 2 or more models, but these residues are not special as to their
 relaxation data or position in the protein...


 Consistency testing and reduced spectral density mapping show that my
 data are of good quality and are consistent with each other...

 I tried with different structures (crystal structure with added protons,
 MM snapshots), but always got the same kind of results...

 I tried several hybrids (with no C-ter, with no C-ter and several loops,
 etc), but always got the same kind of results...

 Also, chi2 values are quite high for most residues (5-20 on average)...

 What should I do now ? Do you have any idea ?

 Thanks a lot for any help or idea !!!!!!!


 Exhausted Séb

 _______________________________________________
 relax (http://nmr-relax.com)

 This is the relax-users mailing list
 relax-users@xxxxxxx

 To unsubscribe from this list, get a password
 reminder, or change your subscription options,
 visit the list information page at
 https://mail.gna.org/listinfo/relax-users

Re: Full analysis issue

Header

Content

Related Messages