Re: Relaxation dispersion clustering calculation time -- September 11, 2014

Dear Edward and Troels,

Thank you for the additional info. So it seems that although cpmg_fit has the 
choice to use different R20’s, current literature is still limited to the 
R20A = R20B assumption. I actually have a copy of Korzhnev’s paper in my 
computer; will certainly take a closer look. I think my inexperience in the 
analysis is also a factor, and your information has been a huge help.

We already got 3.3.0 running, but it is still using an older version of numpy 
in our cluster. I know about canopy (in fact, I have it installed on my 
personal Mac), but last time I tried to install it on my personal account in 
the cluster computer, something went wrong and a lot of python-dependent 
stuff wouldn’t run. Since the system admin already gave his word that he will 
do his best to update our python system, I’ll just trust him… for now *grin*.

Cheers,

Chung-ke

PS: 3.3.0 does feel zippier than the older version, even using an old numpy 
(1.6.2?). The speed up is really impressive. Kudos to a job well done!


On Sep 11, 2014, at 5:36 PM, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Chung-ke,

I actually now remember that I saw the R20A != R20B analysis presented
at a conference somewhere, though again I can't remember by whom.  I'm
pretty sure it was real data, very likely at 3 magnetic fields, and
possibly including multiple-quantum data as well, i.e. the MMQ models
in relax (http://wiki.nmr-relax.com/Category:MMQ_CPMG_data).  I would
guess it was someone from the Kay, Palmer or Wright groups.  You will
probably not find the R20A = R20B assumption written in most papers,
as people just use the software blindly and don't realise that there
is a difference.  Most software have the R20A = R20B assumption
hardcoded so you have no choice.  The more advanced software from
Dmitry Korzhnev (cpmg_fit) allows you to fit these separately though.
You will however find the text about the assumption in pretty much all
of Dmitry's papers, for example in http://dx.doi.org/10.1021/ja054550e
:

   "The adjustable parameters for the "global" two-state model (F <->
U) include nc‚nr‚nf intrinsic (transverse relaxation) R2 rates
(assumed to be the same in F and U states), ..."

This is also well described in Art Palmer's 2001 Methods in Enzymology
review (http://dx.doi.org/10.1016/S0076-6879(01)39315-1).

Regards,

Edward


P. S.  Troels' instructions for setting up your one Python and relax
installation is a great way to quickly have relax available,
especially if you wish to use a new version or the repository version
to obtain a quick bug fix.


On 10 September 2014 19:42, Chung-ke Chang <chungke@xxxxxxxxxxxxxxxxxx> 
wrote:

Dear Edward,

Thank you for the thorough explanation. Yes, I now see why having the 
“full” models would be useful. I will try to track down the references you 
mentioned - I hope they are indexed in PubMed, I really have little idea 
on how to search for “pure” chemistry papers - and take a look at the 
scenarios where using the full models would be appropriate. I guess that I 
also need to re-read some of the literature on how to apply relaxation 
dispersion analysis to biological systems. The R20A = R20B assumption must 
be buried somewhere in the materials and methods section….

Cheers,

Chung-ke

On Sep 10, 2014, at 10:08 PM, Edward d'Auvergne <edward@xxxxxxxxxxxxx> 
wrote:

Hi Chung-ke,

The aim of relax is to support absolutely every NMR dynamics theory in
existence!  For the relaxation dispersion analysis section of relax,
this means supporting all published models for the dispersion data,
and all parametric restrictions of these models.  Many of the
dispersion models have been derived with the assumption that R20A and
R20B are different, the Carver and Richards model is a good example of
this (http://wiki.nmr-relax.com/CR72_full).  These are the '* full'
models in relax.  However in the literature the parametric restriction
R20A = R20B (= R20) is almost always used.  For the analytic models
this can significantly simplify the equations, whereas for the numeric
models the equations do not change.  In both cases, two dimensions of
the the optimisation space collapse into one and the optimisation
problem massively simplifies.  That is why in relax we also provide
the collapsed models (those with the ' full' part of the label
removed).

It is true most literature data is not suitable for the '* full'
models.  That is why they are not turned on by default in the GUI or
listed in the sample scripts.  From memory though, there are cases
whereby the measured data is of high enough quality and collected on
enough magnets that the R20A != R20B assumption can be made.  I cannot
remember the reference(s), but it shouldn't be too hard to find.
Anyway, the full R20A != R20B models are provided in relax for a
number of reasons:

- The rare cases whereby the data is good enough.
- Academic studies.
- Future developments could significantly improve the quality of
measured dispersion data so that the R20A != R20B assumption can be
regularly made.
- Chemists have a different perspective on life compared to
biologists.  Small organic molecules make the R20A vs. R20B
distinction much, much easier.

I hope it is now clearer why there are these models in relax.

Regards,

Edward




On 10 September 2014 15:27, Chung-ke Chang <chungke@xxxxxxxxxxxxxxxxxx> 
wrote:

Dear Edward and Troels,

Thank you all for the help! We are currently testing the new version of 
relax (yes, we are using the “normal” release), and making sure it plays 
along nicely with other software - we have a plethora of different 
python versions, which the system manager is doing his best to avoid 
interfering with each other. I am curious about one thing though: If the 
‘CR72 full’ model has not been used in any published studies, then is 
there any reason to include it when trying to fit “real-world” data? It 
seems that Troels is implying that “real-world” data is too noisy to 
obtain meaningful fitting parameters from the model. Or am I 
misunderstanding something?

Cheers,

Chung-ke

On Sep 9, 2014, at 8:56 PM, Edward d'Auvergne <edward@xxxxxxxxxxxxx> 
wrote:

Hi Chung-ke,

The only way to find out about new relax releases is the
relax-announce mailing list
(http://news.gmane.org/gmane.science.nmr.relax.announce).  Some relax
users were signed up for the freecode announcements
(http://freecode.com/projects/nmr-relax), but freecode has
unfortunately shut down (http://freecode.com/about).

For the version you are currently using, note that this is the
repository version of relax installed by the superuser.  You should
make sure you use the normal releases, as the repository version can
sometimes be in a broken or buggy state as development occurs.  You
can also have a copy in your home directory by typing:

$ svn co http://svn.gna.org/svn/relax/trunk ./relax-trunk
$ cd relax-trunk
$ scons

If you already have a repository version on your system, these
commands should just work.  But you should only use the repository
version if you would like a bug fix and cannot wait until the next
relax release.

Regards,

Edward



On 9 September 2014 10:37, Chung-ke Chang <chungke@xxxxxxxxxxxxxxxxxx> 
wrote:

Dear Troels and Edward,

Thank you for the pointers. I was not aware that a new version was out 
last
week, so I’ve asked the IT people to install it on our cluster. Below 
is the
output from ‘relax -i’:

[chungke@nmrc10 onc_dAUGA_MES_310K]$ relax -i



                               relax repository checkout r24533
                              svn://svn.gna.org/svn/relax/trunk

                           Molecular dynamics by NMR data analysis

                          Copyright (C) 2001-2006 Edward d'Auvergne
                      Copyright (C) 2006-2014 the relax development 
team

This is free software which you are welcome to modify and redistribute 
under
the conditions of the
GNU General Public License (GPL).  This program, including all 
modules, is
licensed under the GPL
and comes with absolutely no warranty.  For details type 'GPL' within 
the
relax prompt.

Assistance in using the relax prompt and scripting interface can be 
accessed
by typing 'help' within
the prompt.

Processor fabric:  Uni-processor.


Hardware information:
 Machine:                 x86_64
 Processor:               x86_64
 Processor name:          Intel(R) Xeon(R) CPU           E5430  @ 
2.66GHz
 Endianness:              little
 Total RAM size:          7983 Mb
 Total swap size:         8189 Mb

Operating system information:
 System:                  Linux
 Release:                 2.6.18-164.el5
 Version:                 #1 SMP Thu Sep 3 03:28:30 EDT 2009
 Distribution:            redhat 5.3 Final
 Full platform string:
Linux-2.6.18-164.el5-x86_64-with-redhat-5.3-Final

Python information:
 Architecture:            64bit ELF
 Python version:          2.5.1
 Python build:            r251:54863, Jul 23 2008 17:35:20
 Python compiler:         GCC Intel(R) C++ gcc 4.1 mode
 Libc version:            glibc 2.3
 Python executable:       /program/nmr/bin/python
 Python module path:      ['/program/nmr/relax',
'/program/nmr/lib/python2.5/site-packages/setuptools-0.6c9-py2.5.egg',
'/program/nmr/lib/python25.zip', '/program/nmr/lib/python2.5',
'/program/nmr/lib/python2.5/plat-linux2',
'/program/nmr/lib/python2.5/lib-tk',
'/program/nmr/lib/python2.5/lib-dynload',
'/program/nmr/lib/python2.5/site-packages',
'/program/nmr/lib/python2.5/site-packages/Scientific/linux2']

Python packages and modules (most are optional):

Name               Installed    Version             Path
minfx              True         1.0.8
/program/nmr/lib/python2.5/site-packages/minfx
bmrblib            True         1.0.3
/program/nmr/lib/python2.5/site-packages/bmrblib
numpy              True         1.6.2
/program/nmr/lib/python2.5/site-packages/numpy
scipy              False
wxPython           False
matplotlib         True         0.98.3
/program/nmr/lib/python2.5/site-packages/matplotlib
mpi4py             True         1.3.1
/program/nmr/lib/python2.5/mpi4py
epydoc             False
optparse           True         1.5.3
/program/nmr/lib/python2.5/optparse.pyc
readline           True
/program/nmr/lib/python2.5/lib-dynload/readline.so
profile            True
/program/nmr/lib/python2.5/profile.pyc
bz2                True
/program/nmr/lib/python2.5/lib-dynload/bz2.so
gzip               True
/program/nmr/lib/python2.5/gzip.pyc
io                 False
xml                True         0.8.4 (internal)
/program/nmr/lib/python2.5/xml/__init__.pyc
xml.dom.minidom    True
/program/nmr/lib/python2.5/xml/dom/minidom.pyc

relax information:
 Version:                 repository checkout r24533
svn://svn.gna.org/svn/relax/trunk
 Processor fabric:        Uni-processor.

relax C modules:

Module                        Compiled    File type
Path
target_functions.relax_fit    True        ELF 64-bit LSB shared 
object, AMD
x86-64, version 1 (SYSV), not stripped
/program/nmr/relax/target_functions/relax_fit.so

As for the data itself, I am using data obtained on two fields and use 
both
from the start.

Upon closer look at the R20 parameters, I think both of you are right: 
the
R20a and R20b numbers are really funky. I shall follow your 
suggestions and
run the calculations with the CR72 and B14 models instead.

Cheers,

Chung-ke

On Sep 9, 2014, at 4:25 PM, Troels Emtekær Linnet 
<tlinnet@xxxxxxxxxxxxx>
wrote:

Hi Chung-ke.

Can you put the information about which version of relax you use?

You can in terminal do:
relax -i

and write it here.

And then there is the question if you used data from one field or two
spectrometer fields.

Fitting to one field, can give problems.
This is described here:

"""Faithful estimation of dynamics parameters from CPMG relaxation
dispersion measurements."""
Kovrigin, Evgenii L; Kempf, James G; Grey, Michael J; Loria, J Patrick
Journal of magnetic resonance, 2006, Vol 180, p 93-104.
http://www.ncbi.nlm.nih.gov/pubmed/16458551
DOI: 10.1016/j.jmr.2006.01.010

Figure 9 and 10 shows these "rotten bananas".

Clustering data, in some way overcome this problem.
Since you now starts to add more data, compared to number of fitting
parameters.

The problem though, is that if you start from "single fitted" data,
and go to "Clustering of data", that
an average of the global parameter will be taken for the single fitted 
data.

In previous version of relax (a version or two ago), we changed from
taking the average to take the median of the parameters.
This was to prevent taking the average of an outliers, if one of the
single fitted spins have been fitted "crazy".
You don't want to start with a global kex at 10000.

I have discussed the CR72 Full model with my supervisor.
He have actually never seen it in use in any paper.
Always the assumption R20A=R20B is used.

If you only have one field, I would not even try this model.
If you still would like to try it, please consider using the B14 full
model as well, to compare.
http://wiki.nmr-relax.com/B14_full

Abstract: "Faithful estimation of dynamics parameters from CPMG
relaxation dispersion measurements."
This work examines the robustness of fitting of parameters describing
conformational exchange (k(ex), p(a/b), and Deltaomega) processes from
CPMG relaxation dispersion data. We have analyzed the equations
describing conformational exchange processes for the intrinsic
inter-dependence of their parameters that leads to the existence of
multiple equivalent solutions, which equally satisfy the experimental
data. We have used Monte-Carlo simulations and fitting to the
synthetic data sets as well as the direct 3-D mapping of the parameter
space of k(ex), p(a/b), and Deltaomega to quantitatively assess the
degree of the parameter inter-dependence. The demonstrated high
correlation between parameters can preclude accurate dynamics
parameter estimation from NMR spin-relaxation data obtained at a
single static magnetic field. The strong parameter inter-dependence
can readily be overcome through acquisition of spin-relaxation data at
more than one static magnetic field thereby allowing accurate
assessment of conformational exchange properties.


Troels Emtekær Linnet
PhD student
Copenhagen University
SBiNLab, 3-0-41

2014-09-09 9:48 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi Chung-ke,

Welcome to the relax mailing lists!  Thanks to the hard work of one of
the relax developers - Troels Linnet - this long calculation time
should now be much, much shorter.  Have a look at the following
release announcement:

http://wiki.nmr-relax.com/Relax_3.3.0

For the 'CR72 full' model (http://wiki.nmr-relax.com/CR72_full), the
clustering example here gives a ~22x speed up so your calculation time
would then drop from ~20,000 min to ~1000 min.  If you would like to
receive announcements about new relax versions, please subscribe to
the relax-announce mailing list
(https://mail.gna.org/listinfo/relax-announce/).  This list only
receives ~10 emails per year.  See
http://news.gmane.org/gmane.science.nmr.relax.announce.

I have a few questions about how you performed the analysis.  Did you
use a non-clustered set of results to seed the clustered analysis?  In
the dispersion auto-analysis protocol exposed via the GUI, the results
from the non-clustered analysis will be taken as the starting point
for optimisation of the clustered analysis, as described in Morin et
al., 2014 (http://dx.doi.org/10.1093/bioinformatics/btu166).  If you
wish, and are capable with scripting, you can also create your own
analysis protocol via a relax script and not use the auto-analysis.
The relax software is very flexible and you can create quite complex
analysis protocols - the auto-analyses are just large relax scripts.

Also, did you look at the results from the non-clustered analysis to
see if the kinetics of all 13 residues are similar?  Or if the
dispersion curves look reasonable?  Some data might be of low quality
and causing difficulties with the optimisation.  You should also note
that most dispersion data is not good enough to differentiate R20A
from R20B.  Do the final results (non-clustered and clustered) look
reasonable for these two parameters?  It could be that differentiating
R20A from R20B in your system is difficult and causing optimisation to
take much longer than normal.  Do you see the same optimisation times
with the clustered CR72 model where R20A=R20B=R20
(http://wiki.nmr-relax.com/CR72)?  Also, have a look at the log file
from the analysis and see if the total number of minimisation
iterations is much longer for the 'CR72 full' model compared to the
CR72 model.  This will tell you if the optimisation problem is much
more complicated for the 'full' model.

Regards,

Edward


On 9 September 2014 09:19, Chung-ke Chang <chungke@xxxxxxxxxxxxxxxxxx>
wrote:

Dear all,

This is my first post here, and I have a question regarding the time it
takes for a relaxation dispersion clustering process to finish. I have 
one
clustering calculation that has been running for ~ 20,000 min on a 
single
Xeon 2.66 GHz core. The cluster consists of 13 residues being fit to 
the
‘CR72 full’ model. I wonder if the long time it is taking is normal? 
Would
it be possible that relax has been stuck in an infinite loop of some 
sort,
without showing up in the log file? Any input would be greatly 
appreciated.
By the way, using a cluster of only 11 residues out of the 13 did 
finish in
~13,000 min.

Chung-ke Chang
Biomacromolecular NMR Lab
Institute of Biomedical Science
Academia Sinica, Taiwan
_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-users mailing list
relax-users@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-users

Re: Relaxation dispersion clustering calculation time

Header

Content

Related Messages