For future reference, this issue is being simultaneously discussed in the thread for r24984 at http://thread.gmane.org/gmane.science.nmr.relax.scm/22734. Troels, could you continue the discussions all in this thread? Cheers, Edward On 18 August 2014 17:10, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
Ah, this logic makes more sense :) We should then use a combination of model speed and accuracy as the decider for both nesting and order! Note that the entire point of the nesting is for speed - by avoiding the grid search for very slow, i.e. numeric, models. The year the model was published is not very useful for this. In the CPMG data case, B14 is better than CR72 for slow exchange. But CR72 is far better than IT99 or TSMFK01 as these are special case models which do not arise often (which should be identified by the user before the analysis). B14 and CR72 are probably not so great for fast exchange situations due to parameter redundancy making the optimisation problem hell for these models. Note that because of historical reasons, possibly speed and politics as well, that many users will choose CR72 instead of B14 (it unfortunately takes a long time for the field to move to better solutions). For the R1rho models, MP05 is better than TP02, DPL94, etc. So maybe we need a hard coded dictionary of lists such as: MODEL_NESTING = { ... MODEL_CR72: [], MODEL_B14: [MODEL_CR72], MODEL_NS_CPMG_2SITE_EXPANDED: [MODEL_B14, MODEL_CR72], MODEL_NS_CPMG_2SITE_3D: [MODEL_NS_CPMG_2_SITE_EXPANDED, MODEL_B14, MODEL_CR72], ... MODEL_NS_R1RHO_2SITE: [MODEL_MP05, MODEL_TAP03, MODEL_TP02, MODEL_DPL94], ... } So here the B14 model is used for the 'NS CPMG 2-site expanded' model first, and then drops back to CR72 if B14 does not exist. We will have to create a hardcoded list of model nesting in the relax manual for this - such a table is essential for the user. Therefore having a hardcoded dictionary in the variables module is not an issue. We could provide a set of functions in the specific_analyses.relax_disp.model module to change this dictionary, if you wish to have it programatically changed. Or, better yet, MODEL_NESTING would be a special Python dictionary object that has methods for changing its defaults. Anyway, it is very hard to pick which model should be used, as the fast vs. slow exchange problem is a dominating factor for the nesting for the numeric models and the alpha-value calculation is not implemented yet (https://gna.org/task/?7800). Nevertheless, hardcoded good default values based on logic of accuracy and speed would be very useful and match a hardcoded table in the manual. But the fast vs. slow question might make users want to change the model nesting, hence why I mentioned earlier about accepting a model nesting dictionary as an argument for the relax_disp auto-analysis. And if the user knows that the best analytic model for their data is TSMFK01 (or IT99), then they would want to use this as the starting point for the numeric models and would not use B14 or CR72 which in these conditions would be bad approximations (though they might use the models anyway for comparison). Regards, Edward On 18 August 2014 16:40, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:Hi Edward. The good thing about the code, is that a change for the nesting only needs very little work. Only one or two lines needs to be changed. That is all. It became a realisation for me, that when adding new models to relax, it should not be necessary to hard code every possibility into relax in the auto analysis. There have to be a function helping out with this. I can write up some examples, which illustrate the problem. CPMG: models = ['R2eff', 'No Rex', 'B14', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded'] This would for the user seem to be a good list of models to test. The user have maybe read, that 'B14' is more precise than 'CR72', and does not wan't "waste" time on CR72. But nothing would be nested, even though that the solutions should be expected to be quite similar. Why not nest from 'B14'? or 'NS CPMG 2-site 3D', if the optimised parameters are there? models = ['R2eff', 'No Rex', 'CR72', 'B14', 'NS CPMG 2-site 3D', 'NS CPMG 2-site expanded'] Here the user "knows", that it is essential to nest from CR72. But if 'NS CPMG 2-site expanded' has been analysed (which nested from CR72), should 'NS CPMG 2-site 3D' then nest from CR72 or "the more precise" 'NS CPMG 2-site expanded' ? It quickly arises to a two step problem. In which order should the models be analysed? From which model should model nest from. Best Troels 2014-08-18 16:21 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:Hi,You will soon see my commits about sorting the models.I'm still catching up on your huge number of changes :)And here we would probably have to discuss a little.It's best to discuss ideas before implementing them, as it's then far easier to change to the best course. It's far more difficult once code and time has been invested into one solution when a better solution could have been first implemented.There is a huge time potential to nest parameters from earlier models.This was a major point in our paper (http://dx.doi.org/10.1093/bioinformatics/btu166, and mentioned in the abstract) which I can now see has volume and page numbers :)And so by common sense, it would/could be best to nest from numerical solutions, since they are more precise. But they are terrible slow.This is not a good idea :S Well, the analytic models are approximations anyway so there is a bias in the parameter values, and this destroys any value the higher precision of the numeric models could give.Unless they are these special hybrid version.Hybrid, I don't understand?The word "silico", I took from: A.J. Baldwin (2014). An exact solution for R2,eff in CPMG experiments in the case of two site chemical exchange. J. Magn. Reson., 2014. (10.1016/j.jmr.2014.02.023). I think it is a very good representation. "Silico" is "fast", and precise.Hmmm, maybe Nikolai should be asked about this. I believe he would have a strong opinion about this categorisation of his model, and I don't remember him ever using such terminology. Andy used the text "derived in silico" in the introduction which is quite different, as 'in silico' just means on a computer. Most modern analytic models would also be derived using Mathametica, Maxima, etc. on a computer and could equally be said to be 'in silico' derived. I bet Andy also used symbolic computational software for his paper ;)But "impossible" to read and understand from the code. I think that "silico" is a good way to represent a "subset" of numerical solutions. I would rather describe a model as being "silico", than to add meta data about "how fast" it is.Why is this differential catagorisation needed at all? I know this would invalidate a lot of work, but would it not be better to have hard coded nested model defaults? I.e. a dictionary structure as you have already used in the variables module, with one entry per model. Regards, Edward