Re: r24983 - in /branches/R1_fitting: specific_analyses/relax_disp/ test_suite/unit_tests/_specific_analyses/_relax_disp/ -- August 18, 2014

For future reference, this issue is being simultaneously discussed in
the thread for r24984 at
http://thread.gmane.org/gmane.science.nmr.relax.scm/22734.  Troels,
could you continue the discussions all in this thread?

Cheers,

Edward

On 18 August 2014 17:10, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Ah, this logic makes more sense :)  We should then use a combination
of model speed and accuracy as the decider for both nesting and order!
 Note that the entire point of the nesting is for speed - by avoiding
the grid search for very slow, i.e. numeric, models.  The year the
model was published is not very useful for this.

In the CPMG data case, B14 is better than CR72 for slow exchange.  But
CR72 is far better than IT99 or TSMFK01 as these are special case
models which do not arise often (which should be identified by the
user before the analysis).  B14 and CR72 are probably not so great for
fast exchange situations due to parameter redundancy making the
optimisation problem hell for these models.  Note that because of
historical reasons, possibly speed and politics as well, that many
users will choose CR72 instead of B14 (it unfortunately takes a long
time for the field to move to better solutions).  For the R1rho
models, MP05 is better than TP02, DPL94, etc.  So maybe we need a hard
coded dictionary of lists such as:

MODEL_NESTING = {
    ...
    MODEL_CR72:  [],
    MODEL_B14:  [MODEL_CR72],
    MODEL_NS_CPMG_2SITE_EXPANDED:  [MODEL_B14, MODEL_CR72],
    MODEL_NS_CPMG_2SITE_3D:  [MODEL_NS_CPMG_2_SITE_EXPANDED,
MODEL_B14, MODEL_CR72],
    ...
    MODEL_NS_R1RHO_2SITE:  [MODEL_MP05, MODEL_TAP03, MODEL_TP02, 
MODEL_DPL94],
    ...
}

So here the B14 model is used for the 'NS CPMG 2-site expanded' model
first, and then drops back to CR72 if B14 does not exist.  We will
have to create a hardcoded list of model nesting in the relax manual
for this - such a table is essential for the user.  Therefore having a
hardcoded dictionary in the variables module is not an issue.  We
could provide a set of functions in the
specific_analyses.relax_disp.model module to change this dictionary,
if you wish to have it programatically changed.  Or, better yet,
MODEL_NESTING would be a special Python dictionary object that has
methods for changing its defaults.

Anyway, it is very hard to pick which model should be used, as the
fast vs. slow exchange problem is a dominating factor for the nesting
for the numeric models and the alpha-value calculation is not
implemented yet (https://gna.org/task/?7800).  Nevertheless, hardcoded
good default values based on logic of accuracy and speed would be very
useful and match a hardcoded table in the manual.  But the fast vs.
slow question might make users want to change the model nesting, hence
why I mentioned earlier about accepting a model nesting dictionary as
an argument for the relax_disp auto-analysis.  And if the user knows
that the best analytic model for their data is TSMFK01 (or IT99), then
they would want to use this as the starting point for the numeric
models and would not use B14 or CR72 which in these conditions would
be bad approximations (though they might use the models anyway for
comparison).

Regards,

Edward




On 18 August 2014 16:40, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> 
wrote:

Hi Edward.

The good thing about the code, is that a change for the nesting only
needs very little work.
Only one or two lines needs to be changed. That is all.

It became a realisation for me, that when adding new models to relax,
it should not be necessary to
hard code every possibility into relax in the auto analysis.

There have to be a function helping out with this.

I can write up some examples, which illustrate the problem.

CPMG:
models = ['R2eff', 'No Rex', 'B14', 'NS CPMG 2-site 3D', 'NS CPMG
2-site expanded']

This would for the user seem to be a good list of models to test.
The user have maybe read, that 'B14' is more precise than 'CR72', and
does not wan't "waste" time on CR72.

But nothing would be nested, even though that the solutions should be
expected to be quite similar.
Why not nest from 'B14'? or 'NS CPMG 2-site 3D', if the optimised
parameters are there?

models = ['R2eff', 'No Rex', 'CR72', 'B14', 'NS CPMG 2-site 3D', 'NS
CPMG 2-site expanded']
Here the user "knows", that it is essential to nest from CR72.
But if 'NS CPMG 2-site expanded' has been analysed (which nested from
CR72), should 'NS CPMG 2-site 3D' then
nest from CR72 or "the more precise" 'NS CPMG 2-site expanded' ?

It quickly arises to a two step problem.

In which order should the models be analysed?
From which model should model nest from.

Best
Troels



2014-08-18 16:21 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi,

You will soon see my commits about sorting the models.


I'm still catching up on your huge number of changes :)

And here we would probably have to discuss a little.


It's best to discuss ideas before implementing them, as it's then far
easier to change to the best course.  It's far more difficult once
code and time has been invested into one solution when a better
solution could have been first implemented.

There is a huge time potential to nest parameters from earlier models.


This was a major point in our paper
(http://dx.doi.org/10.1093/bioinformatics/btu166, and mentioned in the
abstract) which I can now see has volume and page numbers :)

And so by common sense, it would/could be best to nest from numerical
solutions, since they are more precise.
But they are terrible slow.


This is not a good idea :S  Well, the analytic models are
approximations anyway so there is a bias in the parameter values, and
this destroys any value the higher precision of the numeric models
could give.

Unless they are these special hybrid version.


Hybrid, I don't understand?

The word "silico", I took from:
A.J. Baldwin (2014). An exact solution for R2,eff in CPMG experiments
in the case of two site chemical exchange. J. Magn. Reson., 2014.
(10.1016/j.jmr.2014.02.023).

I think it is a very good representation.

"Silico" is "fast", and precise.


Hmmm, maybe Nikolai should be asked about this.  I believe he would
have a strong opinion about this categorisation of his model, and I
don't remember him ever using such terminology.  Andy used the text
"derived in silico" in the introduction which is quite different, as
'in silico' just means on a computer.  Most modern analytic models
would also be derived using Mathametica, Maxima, etc. on a computer
and could equally be said to be 'in silico' derived.  I bet Andy also
used symbolic computational software for his paper ;)

But "impossible" to read and understand from the code.

I think that "silico" is a good way to represent a "subset" of
numerical solutions.

I would rather describe a model as being "silico", than to add meta
data about "how fast" it is.


Why is this differential catagorisation needed at all?  I know this
would invalidate a lot of work, but would it not be better to have
hard coded nested model defaults?  I.e. a dictionary structure as you
have already used in the variables module, with one entry per model.

Regards,

Edward

Re: r24983 - in /branches/R1_fitting: specific_analyses/relax_disp/ test_suite/unit_tests/_specific_analyses/_relax_disp/

Header

Content

Related Messages