Re: r24983 - in /branches/R1_fitting: specific_analyses/relax_disp/ test_suite/unit_tests/_specific_analyses/_relax_disp/ -- August 18, 2014

Hi Troels,

I will continue from your last post in the twin thread
http://thread.gmane.org/gmane.science.nmr.relax.scm/22734.  I have
copied and pasted your message below:

I could write a section on how models are sorted and nested.
And then add some examples.


A hardcoded table will be far more practical.  The user will know
which model they would like to optimise, and by looking it up in a
table they will be able to directly see which models it will use for
the default nesting, if any are used at all.  Programatically
determining the model nesting is a nasty problem, as the measures for
this should be model accuracy under most conditions and speed - two
things which cannot be reliably programatically determined.  And we
need to have hardcoded logic that certain models should not be nested
at all.

By having a specialised function for this, the user can actually try
out the outcome of this before starting calculations.

Another way, is to make it possible to order how the models are
analysed in the GUI.
By a +/- button, moving the models up, and down.
Here, the initial sorting could be performed by the function.


We should not have a separate solution for the GUI and
scripting/prompt UI.  All parts of all UIs are designed so that they
are equal.  You can do everything in all interfaces (well, apart from
Python programming in the GUI).  So that is why a dictionary of model
nesting, defaulting to say MODEL_NESTING, passed into the
auto-analysis is one of the best solutions.  The user could manipulate
this Python structure in a script or the prompt UI.  And in the GUI we
could create a special GUI window for this nesting model structure.
See the spectrum.replicated user function for the best GUI element for
this purpose (this element already exists, so we can recycle it for
easier implementation).  Here the wx.ComboBoxes would list all
dispersion models.

That takes care of the sorting.


The sorting should be based on the nesting which should be based on
accuracy and speed and should be based on which models will be
optimised.  This can be programatically handled if a special
MODEL_NESTING structure is created.  However a hardcoded order might
be the easiest way to resolve all dependencies, in case a user asks
for an impossible combination.  I.e. use B14 for CR72, and
simultaneously use CR72 for B14.

Then comes the nesting.


I would rather say that the sorting is required for the dependency
resolution of the nesting.

Again, for each model the user could select which model to nest from.
Again with the initial suggestion by the function.


I think the logical pre-determination of model nesting by an
experimental scientist would be better than any programmatic solution
based on metadata.  The old nesting in the auto-analysis was not
flexible enough, but the new solution is also far from ideal and is
rather dangerous.  Designing a third solution whereby we can say what
is best for nesting each model, defaulting to lower and lower quality
models if the higher quality model is not in the list, would be best.
And allowing us to say that IT99, M61, LM63, TSMFK01, TAP03, etc.
should not have a nested model based on the unique part of the
dispersion space they model or the unique conditions they fulfil would
be best.  There is no need to delete the metadata variables as that
could be used elsewhere.

Note that the nesting is unique to the auto-analysis and is not
available for a custom user defined analysis.  But having a
MODEL_NESTING structure could be useful for such a power user.  Also
the nesting we published in
http://dx.doi.org/10.1093/bioinformatics/btu166 should be considered
quite important, as users will only just now be reading about that.

Regards,

Edward


On 18 August 2014 17:18, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

For future reference, this issue is being simultaneously discussed in
the thread for r24984 at
http://thread.gmane.org/gmane.science.nmr.relax.scm/22734.  Troels,
could you continue the discussions all in this thread?

Cheers,

Edward

On 18 August 2014 17:10, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Ah, this logic makes more sense :)  We should then use a combination
of model speed and accuracy as the decider for both nesting and order!
 Note that the entire point of the nesting is for speed - by avoiding
the grid search for very slow, i.e. numeric, models.  The year the
model was published is not very useful for this.

In the CPMG data case, B14 is better than CR72 for slow exchange.  But
CR72 is far better than IT99 or TSMFK01 as these are special case
models which do not arise often (which should be identified by the
user before the analysis).  B14 and CR72 are probably not so great for
fast exchange situations due to parameter redundancy making the
optimisation problem hell for these models.  Note that because of
historical reasons, possibly speed and politics as well, that many
users will choose CR72 instead of B14 (it unfortunately takes a long
time for the field to move to better solutions).  For the R1rho
models, MP05 is better than TP02, DPL94, etc.  So maybe we need a hard
coded dictionary of lists such as:

MODEL_NESTING = {
    ...
    MODEL_CR72:  [],
    MODEL_B14:  [MODEL_CR72],
    MODEL_NS_CPMG_2SITE_EXPANDED:  [MODEL_B14, MODEL_CR72],
    MODEL_NS_CPMG_2SITE_3D:  [MODEL_NS_CPMG_2_SITE_EXPANDED,
MODEL_B14, MODEL_CR72],
    ...
    MODEL_NS_R1RHO_2SITE:  [MODEL_MP05, MODEL_TAP03, MODEL_TP02, 
MODEL_DPL94],
    ...
}

So here the B14 model is used for the 'NS CPMG 2-site expanded' model
first, and then drops back to CR72 if B14 does not exist.  We will
have to create a hardcoded list of model nesting in the relax manual
for this - such a table is essential for the user.  Therefore having a
hardcoded dictionary in the variables module is not an issue.  We
could provide a set of functions in the
specific_analyses.relax_disp.model module to change this dictionary,
if you wish to have it programatically changed.  Or, better yet,
MODEL_NESTING would be a special Python dictionary object that has
methods for changing its defaults.

Anyway, it is very hard to pick which model should be used, as the
fast vs. slow exchange problem is a dominating factor for the nesting
for the numeric models and the alpha-value calculation is not
implemented yet (https://gna.org/task/?7800).  Nevertheless, hardcoded
good default values based on logic of accuracy and speed would be very
useful and match a hardcoded table in the manual.  But the fast vs.
slow question might make users want to change the model nesting, hence
why I mentioned earlier about accepting a model nesting dictionary as
an argument for the relax_disp auto-analysis.  And if the user knows
that the best analytic model for their data is TSMFK01 (or IT99), then
they would want to use this as the starting point for the numeric
models and would not use B14 or CR72 which in these conditions would
be bad approximations (though they might use the models anyway for
comparison).

Regards,

Edward




On 18 August 2014 16:40, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> 
wrote:

Hi Edward.

The good thing about the code, is that a change for the nesting only
needs very little work.
Only one or two lines needs to be changed. That is all.

It became a realisation for me, that when adding new models to relax,
it should not be necessary to
hard code every possibility into relax in the auto analysis.

There have to be a function helping out with this.

I can write up some examples, which illustrate the problem.

CPMG:
models = ['R2eff', 'No Rex', 'B14', 'NS CPMG 2-site 3D', 'NS CPMG
2-site expanded']

This would for the user seem to be a good list of models to test.
The user have maybe read, that 'B14' is more precise than 'CR72', and
does not wan't "waste" time on CR72.

But nothing would be nested, even though that the solutions should be
expected to be quite similar.
Why not nest from 'B14'? or 'NS CPMG 2-site 3D', if the optimised
parameters are there?

models = ['R2eff', 'No Rex', 'CR72', 'B14', 'NS CPMG 2-site 3D', 'NS
CPMG 2-site expanded']
Here the user "knows", that it is essential to nest from CR72.
But if 'NS CPMG 2-site expanded' has been analysed (which nested from
CR72), should 'NS CPMG 2-site 3D' then
nest from CR72 or "the more precise" 'NS CPMG 2-site expanded' ?

It quickly arises to a two step problem.

In which order should the models be analysed?
From which model should model nest from.

Best
Troels



2014-08-18 16:21 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi,

You will soon see my commits about sorting the models.


I'm still catching up on your huge number of changes :)

And here we would probably have to discuss a little.


It's best to discuss ideas before implementing them, as it's then far
easier to change to the best course.  It's far more difficult once
code and time has been invested into one solution when a better
solution could have been first implemented.

There is a huge time potential to nest parameters from earlier models.


This was a major point in our paper
(http://dx.doi.org/10.1093/bioinformatics/btu166, and mentioned in the
abstract) which I can now see has volume and page numbers :)

And so by common sense, it would/could be best to nest from numerical
solutions, since they are more precise.
But they are terrible slow.


This is not a good idea :S  Well, the analytic models are
approximations anyway so there is a bias in the parameter values, and
this destroys any value the higher precision of the numeric models
could give.

Unless they are these special hybrid version.


Hybrid, I don't understand?

The word "silico", I took from:
A.J. Baldwin (2014). An exact solution for R2,eff in CPMG experiments
in the case of two site chemical exchange. J. Magn. Reson., 2014.
(10.1016/j.jmr.2014.02.023).

I think it is a very good representation.

"Silico" is "fast", and precise.


Hmmm, maybe Nikolai should be asked about this.  I believe he would
have a strong opinion about this categorisation of his model, and I
don't remember him ever using such terminology.  Andy used the text
"derived in silico" in the introduction which is quite different, as
'in silico' just means on a computer.  Most modern analytic models
would also be derived using Mathametica, Maxima, etc. on a computer
and could equally be said to be 'in silico' derived.  I bet Andy also
used symbolic computational software for his paper ;)

But "impossible" to read and understand from the code.

I think that "silico" is a good way to represent a "subset" of
numerical solutions.

I would rather describe a model as being "silico", than to add meta
data about "how fast" it is.


Why is this differential catagorisation needed at all?  I know this
would invalidate a lot of work, but would it not be better to have
hard coded nested model defaults?  I.e. a dictionary structure as you
have already used in the variables module, with one entry per model.

Regards,

Edward

Re: r24983 - in /branches/R1_fitting: specific_analyses/relax_disp/ test_suite/unit_tests/_specific_analyses/_relax_disp/

Header

Content

Related Messages