Re: How is the R2eff data collected and processed for clustered analysis? -- June 04, 2014

Hi Edward.

Ah ja.
I overwrite the state file for each new global fitting, with the new pipe.
So that is increasing quite much.
I will change that.

I just checked my scripts.
In both cases, I would do one grid search for the first run, and then the
recurring analysis would copy the parameters from the first pipe.

And the speed-up is between these analysis.

Hm.
I have to take that variable out with the grid search!

I am trying to device a profile script, which I can put in the base folder
of older versions of relax.
For example relax 3.1.6 which I also have.

It looks like this:
-------------
# Python module imports.
from numpy import array, float64, pi, zeros
import sys
import os
import cProfile

# relax module imports.
from lib.dispersion.cr72 import r2eff_CR72

# Default parameter values.
r20a = 2.0
r20b = 4.0
pA = 0.95
dw = 2.0
kex = 1000.0

relax_times = 0.04
ncyc_list = [2, 4, 8, 10, 20, 40, 500]

# Required data structures.
s_ncyc = array(ncyc_list)
s_num_points = len(s_ncyc)
s_cpmg_frqs = s_ncyc / relax_times
s_R2eff = zeros(s_num_points, float64)

g_ncyc = array(ncyc_list*100)
g_num_points = len(g_ncyc)
g_cpmg_frqs = g_ncyc / relax_times
g_R2eff = zeros(g_num_points, float64)

# The spin Larmor frequencies.
sfrq = 200. * 1E6

# Calculate pB.
pB = 1.0 - pA

# Exchange rates.
k_BA = pA * kex
k_AB = pB * kex

# Calculate spin Larmor frequencies in 2pi.
frqs = sfrq * 2 * pi

# Convert dw from ppm to rad/s.
dw_frq = dw * frqs / 1.e6


def single():
    for i in xrange(0,10000):
        r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
cpmg_frqs=s_cpmg_frqs, back_calc=s_R2eff, num_points=s_num_points)

cProfile.run('single()')

def cluster():
    for i in xrange(0,10000):
        r2eff_CR72(r20a=r20a, r20b=r20b, pA=pA, dw=dw_frq, kex=kex,
cpmg_frqs=g_cpmg_frqs, back_calc=g_R2eff, num_points=g_num_points)

cProfile.run('cluster()')
------------------------

For 3.1.6
[tlinnet@tomat relax-3.1.6]$ python profile_lib_dispersion_cr72.py
         20003 function calls in 0.793 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.793    0.793 <string>:1(<module>)
    10000    0.778    0.000    0.783    0.000 cr72.py:98(r2eff_CR72)
        1    0.010    0.010    0.793    0.793
profile_lib_dispersion_cr72.py:69(single)
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.005    0.000    0.005    0.000 {range}


         20003 function calls in 61.901 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   61.901   61.901 <string>:1(<module>)
    10000   61.853    0.006   61.887    0.006 cr72.py:98(r2eff_CR72)
        1    0.013    0.013   61.901   61.901
profile_lib_dispersion_cr72.py:75(cluster)
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.035    0.000    0.035    0.000 {range}


For trunk

[tlinnet@tomat relax_trunk]$ python profile_lib_dispersion_cr72.py
         80003 function calls in 0.514 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.514    0.514 <string>:1(<module>)
    10000    0.390    0.000    0.503    0.000 cr72.py:100(r2eff_CR72)
    10000    0.008    0.000    0.040    0.000 fromnumeric.py:1314(sum)
    10000    0.007    0.000    0.037    0.000 fromnumeric.py:1708(amax)
    10000    0.006    0.000    0.037    0.000 fromnumeric.py:1769(amin)
        1    0.011    0.011    0.514    0.514
profile_lib_dispersion_cr72.py:69(single)
    10000    0.007    0.000    0.007    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.030    0.000    0.030    0.000 {method 'max' of
'numpy.ndarray' objects}
    10000    0.030    0.000    0.030    0.000 {method 'min' of
'numpy.ndarray' objects}
    10000    0.025    0.000    0.025    0.000 {method 'sum' of
'numpy.ndarray' objects}


         80003 function calls in 1.209 CPU seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.209    1.209 <string>:1(<module>)
    10000    1.042    0.000    1.196    0.000 cr72.py:100(r2eff_CR72)
    10000    0.009    0.000    0.049    0.000 fromnumeric.py:1314(sum)
    10000    0.007    0.000    0.052    0.000 fromnumeric.py:1708(amax)
    10000    0.007    0.000    0.052    0.000 fromnumeric.py:1769(amin)
        1    0.014    0.014    1.209    1.209
profile_lib_dispersion_cr72.py:75(cluster)
    10000    0.007    0.000    0.007    0.000 {isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
    10000    0.045    0.000    0.045    0.000 {method 'max' of
'numpy.ndarray' objects}
    10000    0.045    0.000    0.045    0.000 {method 'min' of
'numpy.ndarray' objects}
    10000    0.033    0.000    0.033    0.000 {method 'sum' of
'numpy.ndarray' objects}
---------------

For 10000 iterations

3.1.6
Single: 0.778
100 cluster: 61.853

trunk
Single: 0.390
100 cluster: 1.042

------

For 1000000 iterations
3.1.6
Single: 83.365
100 cluster:  ???? Still running....

trunk
Single: 40.825
100 cluster: 106.339

I am doing something wrong here?

That is such a massive speed up for clustered analysis, that I simply can't
believe it!

Best
Troels







2014-06-04 15:04 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi,

Such a huge speed up cannot be from the changes of the 'disp_speed'
branch alone.  I would expect from that branch a maximum drop from 30
min to 15 min.  Therefore it must be your grid search changes.  When
changing, simplifying, or eliminating the grid search, you have to be
very careful about the introduced bias.  This bias is unavoidable.  It
needs to be mentioned in the methods of any paper.  The key is to be
happy that the bias you have introduced will not negatively impact
your results.  For example if you believe that the grid search
replacement is reasonably close to the true solution that the
optimisation will be able to reach the global minimum.  You also have
to convince the people reading your paper that the introduced bias is
reasonable.

As for a script to show the speed changes, you could have a look at
maybe the
test_suite/shared_data/dispersion/Hansen/relax_results/relax_disp.py
file.  This performs a full analysis with a large range of dispersion
models on the truncated data set from Flemming Hansen.  Or
test_suite/shared_data/dispersion/Hansen/relax_disp.py which uses all
of Flemming's data.  These could be run before and after the merger of
the 'disp_speed' branch, maybe with different models and the profile
flag turned on.  You could then create a text file in the
test_suite/shared_data/dispersion/Hansen/relax_results/ directory
called something like 'relax_timings' to permanently record the speed
ups.  This file can be used in the future for documenting any other
speed ups as well.

Regards,

Edward




On 4 June 2014 14:37, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:

Looking at my old data, I can see that writing out of data between each
global fit analysis before took around 30 min.

They now take 2-6 mins.

I almost can't believe that speed up!

Could we devise a devel-script, which we could use to simulate the

change?


Best
Troels



2014-06-04 14:24 GMT+02:00 Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx
:

Hi Edward.

After the changes to the lib/dispersion/model.py files, I see massive
speed-up of the computations.

During 2 days, I performed over 600 global fittings for a 68 residue
protein, where all residues where clustered.I just did it with 1 cpu.

This is really really impressive.

I did though also alter how the grid search was performed, pre-setting
some of the values from known values referred to in a paper.
So I can't really say what has cut the time down.

But looking at the calculations running, the minimisation runs quite

fast.


So, how does relax do the collecting of data for global fitting?

Does i collect all the R2eff values for the clustered spins, and sent it
to the target function
together with the array of parameters to vary?

Or does it calculate per spin, and share the common parameters?

My current bottle neck actually seems to be the saving of the state

file,

between each iteration of global analysis.

Best
Troels

_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: How is the R2eff data collected and processed for clustered analysis?

Header

Content

Related Messages