Development of the relax-disp branch. -- May 07, 2013

Hi Troels,

This sub-thread (which will appear at
http://thread.gmane.org/gmane.science.nmr.relax.devel/3833) will
hopefully be a mini-tutorial covering the development of the
relax_disp branch.  Before you can be accepted as a relax developer
with commit access to the source code repository, you should first
submit changes as patches.  This takes longer initially, but it allows
the other relax developers to see how you code and if you are
following the coding conventions as described in the development
chapter of the relax manual
(http://www.nmr-relax.com/manual/relax_development.html).  I can give
you feedback as you go as to how to improve the code to fit into
relax.  We, the relax developers, will after a few patches have a
private vote to accept you as a relax developer.  This is standard
practice in an open source project.  The full procedure for becoming a
developer is detailed in the 'Committers' section of the manual
(http://www.nmr-relax.com/manual/Committers.html).  The PDF version of
the manual is easier to read
(http://download.gna.org/relax/manual/relax.pdf).  Patches can be
posted to the patch tracker (https://gna.org/patch/?group=relax).

relax development begins and ends with the test suite.  The idea is
that, before any code is present, a relax system test must be created.
 This allows you to develop the ideas for how the UI should work with
the analysis - i.e. which new user functions will need to be created
and which ones will need to be expanded.  A script is added to
test_suite/system_tests/scripts/relax_disp/ and then a test added to
test_suite/system_tests/relax_disp.py which executes the script and
then checks the data and results.  For example see the script
'test_suite/system_tests/scripts/relax_disp/hansen_data.py' and the
function test_hansen_cpmg_data_fast_2site() in the file
'test_suite/system_tests/relax_disp.py'.  This is obviously not
complete as only the script is executed - the results are not yet
checked (as we do not know what the result for the optimised model
should be yet).  This individual test can be executed with the
command:

$ relax -s Relax_disp.test_hansen_cpmg_data_fast_2site

This test, as well as the other Relax_disp tests, were created by
Sebastien Morin when he started the development of the relax_disp
branch.  I have renamed everything since he added it, and will
probably do so again soon.  It is best to develop for the script UI
first - the GUI will later be modified around the graphical versions
of the user functions, or directly accessing the back end of the user
function.  Due to the advanced state of the relax_disp branch, you
probably do not need to worry about new user functions.  This may be
needed if you would like to expand the analysis to new types of data
(for example off-resonance R1rho where R1 data need to be measured and
used in the analysis, H/D exchange, etc.).

The test suite is one area which can be expanded to handle the
different CPMG models.  The testing is currently not very extensive.
For example before a new dispersion model is added to relax, it would
be good if synthetic data were to be created in an external program (a
Python script, Matlab, Mathematica, Maxima, etc.).  It is very
important that relax is not used to create the data.  Synthetic data
is very important for making sure that relax obtains the correct
result, as you know what the result should be.  With measured data you
can never really know what the true result is - this is the entire
point of the mathematical field of modelling (this field makes that of
NMR look very, very small).  Synthetic data is also useful for double
checking results against other relaxation dispersion software (for
reference: NESSY - http://home.gna.org/nessy/;  CPMGFit -
http://www.palmer.hs.columbia.edu/software/cpmgfit.html;  ShereKhan -
http://sherekhan.bionmr.org/;  CATIA -
http://www.biochem.ucl.ac.uk/hansen/catia/).  Data could also be taken
from Art Palmer's CPMGFit manual
(http://www.palmer.hs.columbia.edu/software/cpmgfit_manual.html).
This would need to be converted into peak intensities in a peak list
file, but that is easy enough by simply picking random I0 values for
the exponential curves.  The data could be passed quickly through each
of the models of the CPMGFit program and results noted.  Then the
results would be added to the checks of different relax system tests.

Each different data set used in the testing process should be located
in its own directory in test_suite/shared_data/dispersion/.  That
directory can include the data and all scripts used to generate the
data and, for reference, it can also contain subdirectories for
holding the input and output for different programs (as long as the
files are not too big).

The current state of the branch is that all of the user functions are
pretty close to complete.  The user function consists of a front end
definition in user_functions/, and a backend either in pipe_control/
or specific_analyses/.  The relaxation dispersion target function
setup for optimisation is close to complete.  You can see this in the
minimise() method of the specific_analyses/relax_disp/__init__.py
file, and then the __init__() method of the class in
target_functions/relax_disp.py.  As you will see in the model_loop()
method of the specific_analyses/relax_disp/__init__.py code,
clustering of spin systems is already part of this design - everything
handles a group of spins assuming the same parameter values.  One
missing feature that I might work on soon is the handling of missing
input data, as this affects my current work.  This is a problem
currently caught by the
test_suite/shared_data/dispersion/Hansen/relax_disp.py script, as
residue :71 is missing data at one field strength.  But once the
dispersion tests have been expanded, this can be tested properly by
deleting data for single points on the exponential curves, deleting
entire exponential curves (or dispersion points for the two-point
analysis type), or all data from a single spectrometer field strength
for a single spin.

So I would suggest that you pick one of the dispersion models you are
interested in and try to implement that.  I am working on the Luz and
Meiboom, 1963 model, but all of the other models are safe to work on.
Just say which you are interested in so that we don't both change the
same code.  The system test data would come first.  The formula can be
taken, a set of parameters for 2-3 spins chosen, and a simple script
written to generate the R2eff data, importantly at multiple magnetic
field strengths.  That data can then be converted into a generic peak
list for different time periods on a basic 2-parameter exponential
curve.  See the 'File formats' section of the
spectrum.read_intensities user function docstring, for example by
typing help(spectrum.read_intensities) in the prompt UI.  In the same
script the creation of input files for other programs could be added,
possibly at a later stage, and the data quickly run through CPMGFit,
for example, for a sanity check.

If you do test the other programs, you may encounter a severe bug in
one of their models.  No software is bug free.  In such a case, we
should communicate with the authors in private and they can decide
what to do.  You can see that I did this with Art Palmer's Modelfree
program at 
http://biochemistry.hs.columbia.edu/labs/palmer/software/modelfree.html.
 Versions 4.16 and 4.20 consist of patches that I send to Art to fix
compilation issues and other bugs (I pointed out the grid search
problem due to the singular matrix failure of the Levenberg-Marquardt
algorithm and Art made that change himself).

Once some data has been created and files attached to the patch
tracker (https://gna.org/patch/?group=relax), then the relax script
can be written and added to
test_suite/system_tests/scripts/relax_disp/.  The best way would
probably be for one of the current scripts to be copied (by me to
start with) in the repository and then you make small changes to it
and send the patches created with:

$ svn diff > patch

Then the script execution and data and parameter checking code can be
added to test_suite/system_tests/relax_disp.py - again you can look at
the other methods in that file and create a new one by copying how an
old method operates.  In that system test you would check that the
original parameters have been found.

At this stage, the test should run fine up to the grid_search user
function, and then fail (or possibly at the relax_disp.select_model
user function call in the script depending on whether you use the
auto-analysis code in auto_analyses.relax_disp or not).  This is the
point where the model can be implemented.  Then you would take the
following steps:

- Add a description of the new model with the equation and reference
to the user_functions.relax_disp module.

- Add the model and its parameters to the _select_model() method of
the specific_analyses/relax_disp/__init__.py file.

- Add any new parameter definitions to the top of the
specific_analyses/relax_disp/__init__.py file in the __init__() method
as needed.  If new parameters are needed, then there are various
places in the specific_analyses.relax_disp package where support will
be needed, mainly in the specific_analyses.relax_disp.parameters
module.

- Create a new module in the lib.dispersion package for the model
function.  This module will eventually hold the model function, the
gradient (each partial derivative with respect to each parameter would
be in a different function), and the Hessian (the matrix of second
partial derivatives).  Having the gradient and Hessian will allow for
the more powerful optimisation algorithms to be used.

- Add a new method to target_functions/relax_disp.py which uses the
new code in lib.dispersion to calculate R2eff values, combine this
with the chi2 function, and return the chi-squared value (see the
current func_LM63() method for how to do this).

- Finally, see if the system test passes.  If not, then it is time to debug.

During these steps, the unit test part of the test suite can be used
to make sure that individual functions and methods behave correctly.
This is useful as users will always find a way to break your code.
Once the system test passes, then you will know that the
implementation is complete and fully functional.


If your interest is in the numerical integration of the
Bloch-McConnell equations, then the procedure might be slightly
different.  We would have to discuss this in more detail, with paper
references and the necessary equations.  But I think that all of this
can be handled in a module of the lib.dispersion package, and the rest
of the above detailed procedure would be the same.  I hope this post
wasn't too long for you!

Regards,

Edward




On 6 May 2013 21:14, Troels Emtekær Linnet <tlinnet@xxxxxxxxx> wrote:

Hi Edward.

When you have completed your ideas of change to the
disp branch, could you send me a notits?

And maybe a script file, how to launch the code?

Then I could try to figure out where I should extend new code.

Best
Troels


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Development of the relax-disp branch.

Header

Content

Related Messages