mailRe: [sr #3078] Parse spin.model names to the value.write user function.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on September 10, 2013 - 18:20:
Hi,

Please see below:


On 10 September 2013 16:59, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> wrote:
Well well well.

That's very smart with the list of things which should be written out.

So, I think it now boils down to a level of convenience.

----------
* Problem

When I look in the "final" run directory, I do not readily have access
to a file (or a plot ref:
http://article.gmane.org/gmane.science.nmr.relax.devel/4425)
which tells me which model has been used per spin.

This is a bit of an aside, but why does the model matter?  From the
biological perspective what matters is the dynamics, or how the system
is moving.  The information about the model itself does not say
anything about what is happening in the system being studied.  And the
differences between the models is only statistical significance.  What
would be of more interest would be a plot of the parameters onto the
3D structure in PyMOL or MOLMOL.  This is by far the best way to
extract the biological story from the results.

Note that most of the models should be removed from the analysis
unless it is considered valid for the data.  So for example if you
know that all spins are undergoing fast exchange, then the slow
exchange models should not be used, and vice versa.  Or if you are not
interested in the analytic models, then you can skip those (using the
CR72 model would speed up the calculations though).  But when the
analytic models are within their range of validity, the dynamics
parameters should be very similar to the numeric parameter values.
The list of models you use in the end should be quite limited and they
should give similar results.  The main comparison will be to the 'No
Rex' model.  You would then use this model information to plan how you
would perform a subsequent analysis with clustering.

For this point though, I think that having a 'model.out' file is
reasonable.  The information in a 'model.out' file could be useful for
planning the next analysis with clustering.


I see this information as vital, and I believe it should be presented
as easy as possible.

Why is it vital?  What does it say?  The model is simply a tool to
extract information, it is not the information itself.

The way this would be normally performed in the GUI is that, once the
auto-analysis has completed, the user would produce any additional
output they wish with the user function menu entries.  Or they would
use one of the sample scripts as described below.


I can create such a list in relax, but is is not "easily reached".

Using the user functions and the sample scripts, once the analysis is
complete, this is relatively easy.


I do not understand why there is a desire not to include a column with
the spin.model information.
There is probably a deeper lying reason for this.

Because the information of which model was selected has zero
biological relevance.  It is a huge distraction to the user and gets
in their way from concentrating on the truly interesting part of the
analysis - the dynamics.  When looking at the timescale of motional
processes in a biomolecule, your conclusions as to the biological
significance of this motion will not be influenced by the choice of
model.  I have deliberately prevented users from accessing this
information in the model-free part of relax because there were too
many instances of users presenting a 'plot' of the selected model and
concentrating on the per spin model rather than the timescales and
amplitudes of motion.  It really does not say anything about the
motion, it is simply a distraction for the user.  The user should be
looking at the dynamics and why that is important, and not the model.
This aspect is also a strong reason why the Kay group have abandoned
the use of analytic models - there is far too much emphasis on models
and selecting between them - but the model is of zero interest.  The
numeric models can replace all of the analytic models.  Using only the
numeric model has the equivalent effect of slapping the user in the
face and telling them to stop looking at the model.

I do not want to prevent this from being done, relax is designed to be
flexible enough to do as anyone wishes.  But the default would be not
to present this to the user.  It is better that the user concentrate
on the dynamics, not the model information.  The logs and the
'model.out' file would be the only places where a user can access that
information (apart from generating custom tables after the analysis
completes).  From that together with the dispersion plot, they can
then judge if the analysis should be repeated without that model.


But where should I tackle this problem?

I think that what you are after could be better served by adding a new
sample script.  Have a look at the following files:

sample_scripts/model_free/table_csv.py
sample_scripts/model_free/table_latex.py

These scripts will take the final results file and produce either a
CSV or LaTeX file containing a table of all spins with all model names
and all parameter values.  These scripts are used to generate a table
of all results to be added to supplementary material for publications.
 They are deliberately not part of the auto-analyses, the user should
not be using them for trying to understand the dynamics of the system
or to answer their biological questions.  It should be possible to
create this with the value.write user function, but a custom script
like the two above might be best.  Is this what you had in mind?


1) The spin.model is included as column in the value.write parameter file.
- Cons: This information is superfluous for the writing of each of the 
models.
- Cons: Change of file formats can give un-expected problems for users.

The biggest con is that each user believes that a different piece of
information is the most important to have - especially sitting side by
side with the results they are looking at.  And for each analysis type
in relax, what is considered important would always be different.  So
we could also include arguments for the selection flag, for the
'fixed' flag, for the pseudo-contact shift value, for the chemical
shift, for the minimisation warning flags, for the chi-squared value,
for the number of data sets analysed, for the parameter list, for the
J-coupling value, etc.  The list is endless.  If we were to include
the model information, then by exactly the same argument all of the
other information should also be included.  This is not feasible.
There are also many analysis types in relax where there is no such
thing as different models.  But it is feasible for a user to generate
the exact data combination they are after with the value.write user
function or a custom sample script.


2) The spin.model is included as column in the value.write parameter
file for the "final" round.
- Pro: This gives the overview I am looking for.
- Cons: Change of file formats can give problems.

As above, it doesn't fit in with the other analysis types.


3) A "model" file is written out for the "final" round.
- Cons: It is annoying to compare between two files.

I think this can and should be done.


4) Duplicates of all parameter files, where the copy include the
spin.model as a column.
Well, dublicate of files is not "nice".

This is not ideal, and I believe that the user should not be presented
with this by default.  They can always use a sample script to create
the massive comparison table.


Given that linux have the ability to merge text files:
paste kex.out dw.out

I can "survive" with option 3, and I think this is what you are hinting for.

From this, I believe that a table with all results with a column for
the spin information, the spin selection flag, the model selected, and
then all parameters and errors would produce the file you are after.
Would the full results tables described above cover this?

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Tue Sep 10 18:40:06 2013