mailRe: How is the R2eff data collected and processed for clustered analysis?


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on June 04, 2014 - 14:52:
Hi Troels,

Please see below:


After the changes to the lib/dispersion/model.py files, I see massive
speed-up of the computations.

During 2 days, I performed over 600 global fittings for a 68 residue
protein, where all residues where clustered.I just did it with 1 cpu.

This is really really impressive.

I did though also alter how the grid search was performed, pre-setting some
of the values from known values referred to in a paper.
So I can't really say what has cut the time down.

It looks like the dispersion analysis is faster, but probably by a
factor of between 1 to 2 for the different dispersion models.  So the
huge speed up will likely be due to the cuts in the computationally
expensive grid search.  That is why there are all those tricks in the
auto-analysis - copying parameters from other optimised models, either
nested or analytic to numeric, and averaging values for the cluster
analysis - which were published as part of the paper on this analysis
(http://dx.doi.org/10.1093/bioinformatics/btu166).


But looking at the calculations running, the minimisation runs quite fast.

So, how does relax do the collecting of data for global fitting?

You can see this in the docstring for the dispersion target function
class.   For example for the R2eff/R1rho values:

"""
        @keyword values:            The R2eff/R1rho values.  The
dimensions are {Ei, Si, Mi, Oi, Di}.
        @type values:               rank-4 list of numpy rank-1 float arrays
"""

So you can see that the second dimension is the spins, i.e. this will
range over all spins in the cluster.  The target function class is
initialised once per cluster (or per free spin), as defined by the
model_loop() API method.


Does i collect all the R2eff values for the clustered spins, and sent it to
the target function
together with the array of parameters to vary?

Yes, but again see the docstring for the fine details.  Looking at the
code for assembling the R2eff/R1rho data structures will also show
this.


Or does it calculate per spin, and share the common parameters?

The chi-squared is calculated per spin with the common parameters and
then summed over all spins of the cluster, all within one target
function call.  That will be the loop over si that you see inside each
target function.


My current bottle neck actually seems to be the saving of the state file,
between each iteration of global analysis.

You should have a look at what is being saved inside each file.  This
should only take a few seconds, or tens of seconds if you have Monte
Carlo simulations.  Which global analysis are you talking about?  Your
own custom analysis or the auto-analysis?  The auto-analysis saves
results files rather than state files (result files are simply one
data pipe whereas state files are everything in the data store,
including non-pipe structures).  If it is a custom analysis, then
maybe you are not resetting the relax data store between iterations
and hence your files will be becoming bigger and bigger with each
iteration - i.e. the number of data pipes is increasing each time.  Or
maybe you should use the results.write rather than state.save user
function.

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Thu Jun 05 15:40:10 2014