Re: influence of pdb orientation on model-free optimization? -- February 17, 2008

Hi,

Sorry for the delay, I am flat out preparing for the ENC conference at
the moment.  Something is a little strange, but unfortunately I don't
know where the source of the problem is.  It is worth looking at the
diffusion tensor parameters and the final results for each of the
analyses - the model and optimisation differences may not be
statistically significant.  Maybe the differences are simply from the
truncation of coordinates in a PDB file.  If you can track down the
source of the difference (whether in relax or elsewhere), would you be
able to report this information?  It will be very useful.  Sorry I
can't be of more help.

Thanks,

Edward



On Feb 8, 2008 8:07 PM, Douglas Kojetin <douglas.kojetin@xxxxxxxxx> wrote:

Hi Edward,

For the first run listed below (tensor, # runs to convergence):

sphere, 4
prolate, 7
oblate, 9
ellipsoid, 9

For the second run listed below (tensor, # runs to convergence):
sphere, 4
prolate, 5
oblate, 9
ellipsoid, 9

Doug



On Feb 7, 2008, at 4:03 AM, Edward d'Auvergne wrote:

Hi,

Something strange is happening in your analysis.  Unfortunately with
the limited information in your post, I really cannot start to track
it down.  The fact that the local_tm and sphere runs are the same in
both is a good sign.  Do you have the number of iterations required
for the convergence of each tensor?

The full_analysis.py script should be insensitive to the input
structure, but there is one point at which this might not be exactly
true.  The grid search for the initial tensor parameter values prior
to minimisation is dependent.  You can think of the grid search as a
cage which stays fixed in space while the molecule spins around inside
it.  But the subsequent Newton optimisation should easily recover from
the small differences of the increments between grid points.  That
might be a place to start looking though.

The only point in relax where a random number generator is utilised is
in the Monte Carlo simulations.  It might also be worth looking at the
Dr value in the ellipsoid optimisations.  If this value is close to
zero, then the results from the prolate and ellipsoid diffusion
tensors may actually be almost the same.  In which case, the
differences don't matter.  For understanding molecule motions, the
model itself is of no interest - this is, of course, a simple exercise
in the field of mathematical modelling (a mathematics library will
show you the extent of this field).  Note that it is what the model
says about the dynamics which is of interest, not the details of the
model itself.  So two completely different models may actually say the
same thing, but maybe with small statistically insignificant
difference.

That being said, this problem should not occur.  But as I cannot do
anything with the limited information, you may need to hunt down where
the problem lies yourself (whether in relax, the operation of relax,
the use of the quadric_diffusion program, or elsewhere).

Regards,

Edward



On Jan 30, 2008 3:24 PM, Douglas Kojetin
<douglas.kojetin@xxxxxxxxx> wrote:

Hi Edward,

As a followup to this, I performed two relax runs using six datasets
(r1, r2 and noe at two fields) with two identical structures, but one
had been rotated/translated using the quadric_diffusion program
provided by the Palmer lab.  For one structure, a prolate tensor is
chosen, whereas an elliposid tensor is chosen for the rotated/
translated structure:

## ORIGINAL PDB
Run                  Chi2                 Criterion
local_tm             102.67810            870.67810
sphere               177.96407            807.96407
prolate              152.70721            796.70721
oblate               178.61058            810.61058
ellipsoid           155.78475            801.78475

The model from the run 'prolate' has been selected.

## ROTATED/TRANSLATED PDB
Run                  Chi2                 Criterion
local_tm             102.67810            870.67810
sphere               177.96407            807.96407
prolate              175.13432            803.13432
oblate               178.61979            810.61979
ellipsoid            155.82168            801.82168

The model from the run 'ellipsoid' has been selected.


There are no differences in the models selected for two of the three
structure-dependent runs (oblate and ellipsoid tensor runs) , but
there are a handful of differences in the models selected for the
prolate tensor runs.  Is the full_analysis protocol sensitive to the
orientation of the input structure, or could this be a result of
different runs using something equivalent to different random number
seeds?

Doug




On Jan 10, 2008, at 2:36 PM, Edward d'Auvergne wrote:

Yes, with 4 data sets you could remove tm6 to tm8.  You would also
need to remove m8.  But in this situation, you will be significantly
biasing the initial position (the starting universe will be further
away from that of the universal solution).  I don't know how well
this
new protocol will perform 4 data sets, i.e. this is untested, but I
would be highly reluctant to trust it.  The relaxation data type and
field strength will be very important.  I would even be wary using 5
data sets, especially if the missing data set is the higher-field
NOE.
 So I would never recommend using 4 data sets.

Regards,

Edward


On Jan 10, 2008 8:12 PM, Douglas Kojetin
<douglas.kojetin@xxxxxxxxx> wrote:

Hi Edward,

Thanks for the response.  So, with 5 relaxation data sets, only tm8
should be removed -- no need to remove m8 as well?  Also, if only 4
relaxation data sets were available, could {tm6-8 and m8} be
removed
to use the full_analysis.py protocol?

Thanks,
Doug



On Jan 10, 2008, at 1:31 PM, Edward d'Auvergne wrote:

Hi,

If you have 5 relaxation data sets, you can use the
full_analysis.py
script but you will need to remove model tm8.  This is the only
model
with 6 parameters and doing the analysis without it might just
work
(the other tm0 to tm9 models may compensate adequately).

I've looked at the script and it seems fine.  I think the issue is
that the model-free problem is not simply an optimisation
issue.  It
is the simultaneous combination of global optimisation
(mathematics)
with model selection (statistics).  You are not searching for the
global minimum in one space, as in a normal optimisation problem,
but
for the global minimum across and enormous number of spaces
simultaneously.  I formulated the totality of this problem
using set
theory here http://www.rsc.org/Publishing/Journals/MB/article.asp?
doi=b702202f
or in my PhD thesis at
http://eprints.infodiv.unimelb.edu.au/archive/00002799/.  In your
script, the CONV_LOOP flag allows you to automatically loop over
many
global optimisations.  Each iteration of the loop is the
mathematical
optimisation part.  But the entire loop itself allows for the
sliding
between these different spaces.  Note that this is a very, very
complex problem involving huge numbers spaces or universes,
each of
which consists of a large number of dimensions.  There was a
mistake
in my Molecular BioSystems paper in that the number of spaces is
really equal to n*m^l where n is the number of diffusion models,
m is
the number of model-free models (10 if you use m0 to m9), and l
is the
number of spin systems.  So if you have 200 residues, the
number of
spaces is on the order of 10 to the power of 200.  The number of
dimensions for this system is on the order of 10^2 to 10^3.  So
the
problem is to find the 'best' minimum in 10^200 spaces, each
consisting of 10^2 to 10^3 dimensions (the universal solution
or the
solution in the universal set).  The problem is just a little more
complex than most people think!!!

So, my opinion of the problem is that the starting position of
one of
the 2 solutions is not good.  In one (or maybe both) you are
stuck in
the wrong universe (out of billions of billions of billions of
billions....).  And you can't slide out of that universe using the
looping procedure in your script.  That's why I designed the new
model-free analysis protocol used by the full_analysis.py script
(http://www.springerlink.com/content/u170k174t805r344/?
p=23cf5337c42e457abe3e5a1aeb38c520&pi=3
or the thesis again).  The aim of this new protocol is so that you
start in a universe much closer to the one with the universal
solution
that you can ever get with the initial diffusion tensor estimate.
Then you can easily slide, in less than 20 iterations, to the
universal solution using the looping procedure.  For a published
example of this type of failure, see the section titled
"Failure of
the diffusion seeded paradigm" in the previous link to the
"Optimisation of NMR dynamic models II" paper.

Does this description make sense?  Does it answer all your
questions?

Regards,

Edward



On Jan 10, 2008 5:49 PM, Douglas Kojetin
<douglas.kojetin@xxxxxxxxx> wrote:

Hi All,

I am working with five relaxation data sets (r1, r2 and noe at
400
MHz; r1 and r2 and 600 MHz), and therefore cannot use the
full_analysis.py protocol.  I have obtained estimates  for tm,
Dratio, theta and phi using Art Palmer's quadric_diffusion
program.
I modified the full_analysis.py protocol to optimize a prolate
tensor
using these estimates (attached file: mod.py).  I have performed
the
optimization of the prolate tensor using either (1) my original
structure or (2) the same structure rotated and translated by the
quadric_diffusion program.  It seems that relax does not
converge to
a single global optimum, as different values of tm, Da, theta
and phi
are reported.

Using my original structure:
#tm = 6.00721299718e-09
#Da = 14256303.3975
#theta = 11.127323614211441
#phi = 62.250251959733312

Using the rotated/translated structure by the quadric_diffusion
program:
#tm = 5.84350638161e-09
#Da = 11626835.475
#theta = 8.4006873071400197
#phi = 113.6068898953142

The only difference between the two calculations is the
orientation
of the input PDB structure file.  For another set of five rates
(different protein), there is a >0.3 ns difference in the
converged
tm values.

Is my modified protocol (in mod.py) setup properly?  Or is this a
more complex issue in the global optimization?  In previous
attempts,
I've also noticed that separate runs with differences in the
estimates for Dratio, theta and phi also converge to different
optimized diffusion tensor variables.

Doug


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: influence of pdb orientation on model-free optimization?

Header

Content

Related Messages