Re: Speed up suggestion for task #7807. -- June 10, 2014

The problem is, that if you make multiply with [300], to a numpy
array, it will expand the last axis.

Reshaping that into a usable form is the problem.

I spent many many many hours, to get to this:
# Reshape dw to per experiment and nr spins. Or for R20A per
experiment, spin and frequency.
[300] -> [1, 300]    :     [300] -> [1, 100, 3]

# Expand axis
[1, 300] -> [1, 300, 1, 1, 1] : [1, 100, 3]-> [1, 100, 3, 1, 1]

# Then tile to dimensions
[1, 300, 1, 1, 1] -> [1, 300, 3, 1, 22]      :     [1, 100, 3, 1, 22]

This is a very logic build up of the shapes.
The add or multiply would multiply to the last axis.

The only thing I could do, was to create newaxis.
And then tile up!

--------- Example code to play with.
import numpy as np
s = [1, 100, 3, 1, 22]
print "dw"
dw = np.asarray(range(100))
print dw.shape
dw1 = dw.reshape( (s[0], s[1]) )
print dw1.shape
dw2 = dw1[:,:,None,None,None]
print dw2.shape
dw3 = np.tile(dw2, (1, 1, s[2], s[3], s[4]))
print dw3.shape

print "r20"
r = np.asarray(range(300))
print r.shape
r1 = r.reshape( (s[0], s[1], s[2]) )
print r1.shape
r2 = r1[:,:,:,None,None]
print r2.shape
r3 = np.tile(r2, (1, 1, 1, s[3], s[4]))
print r3.shape
-------------------


With this, I think I am done!

I really would like to wrap this up and continue with my own stuff.
So, what needs to be done, before you can accept this?

I will:
Expand to other CPMG models, which are analytical
Include R1rho.
I cannot do for numerical, do to the looping.

Should I add unit tests for each?


------------ No time saved on one spin, but 21X speed up on clustering !
3 frq,
1000 iterations


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)

1 spin
        1    0.000    0.000    0.351    0.351 <string>:1(<module>)
        1    0.001    0.001    0.351    0.351 profiling_cr72.py:427(single)
     1000    0.001    0.000    0.341    0.000 profiling_cr72.py:413(calc)
     1000    0.009    0.000    0.340    0.000 
relax_disp.py:989(func_CR72_full)
     1000    0.024    0.000    0.327    0.000 
relax_disp.py:523(calc_CR72_chi2)
     1003    0.088    0.000    0.237    0.000 cr72.py:101(r2eff_CR72)
     2003    0.035    0.000    0.122    0.000 numeric.py:2056(allclose)
    10046    0.051    0.000    0.051    0.000 {method 'reduce' of
'numpy.ufunc' objects}
     3000    0.029    0.000    0.047    0.000 shape_base.py:761(tile)
     4015    0.005    0.000    0.035    0.000 fromnumeric.py:1762(any)

100 spin
        1    0.000    0.000    1.507    1.507 <string>:1(<module>)
        1    0.001    0.001    1.507    1.507 profiling_cr72.py:449(cluster)
     1000    0.001    0.000    1.433    0.001 profiling_cr72.py:413(calc)
     1000    0.010    0.000    1.432    0.001 
relax_disp.py:989(func_CR72_full)
     1000    0.052    0.000    1.413    0.001 
relax_disp.py:523(calc_CR72_chi2)
     1300    0.917    0.001    1.193    0.001 cr72.py:101(r2eff_CR72)
     2300    0.101    0.000    0.224    0.000 numeric.py:2056(allclose)
     3000    0.034    0.000    0.157    0.000 shape_base.py:761(tile)
     4000    0.108    0.000    0.108    0.000 {method 'repeat' of
'numpy.ndarray' objects}
    11828    0.085    0.000    0.085    0.000 {method 'reduce' of
'numpy.ufunc' objects}
        1    0.000    0.000    0.073    0.073 profiling_cr72.py:106(__init__)
        1    0.013    0.013    0.058    0.058
profiling_cr72.py:173(return_r2eff_arrays)
     1000    0.030    0.000    0.046    0.000 chi2.py:72(chi2_rankN)
     4609    0.006    0.000    0.045    0.000 fromnumeric.py:1762(any)
     2300    0.004    0.000    0.035    0.000 fromnumeric.py:1621(sum)


------------------TRUNK
1 spin
        1    0.000    0.000    0.351    0.351 <string>:1(<module>)
        1    0.001    0.001    0.351    0.351 profiling_cr72.py:427(single)
     1000    0.001    0.000    0.343    0.000 profiling_cr72.py:413(calc)
     1000    0.008    0.000    0.342    0.000 
relax_disp.py:911(func_CR72_full)
     1000    0.032    0.000    0.330    0.000 
relax_disp.py:456(calc_CR72_chi2)
     3003    0.177    0.000    0.246    0.000 cr72.py:100(r2eff_CR72)
    12036    0.061    0.000    0.061    0.000 {method 'reduce' of
'numpy.ufunc' objects}
     3000    0.025    0.000    0.049    0.000 chi2.py:32(chi2)
     6003    0.008    0.000    0.049    0.000 fromnumeric.py:1621(sum)


        1    0.000    0.000   32.180   32.180 <string>:1(<module>)
        1    0.001    0.001   32.180   32.180 profiling_cr72.py:449(cluster)
     1000    0.001    0.000   32.132    0.032 profiling_cr72.py:413(calc)
     1000    0.009    0.000   32.131    0.032 
relax_disp.py:911(func_CR72_full)
     1000    3.121    0.003   32.112    0.032 
relax_disp.py:456(calc_CR72_chi2)
   300300   17.356    0.000   23.903    0.000 cr72.py:100(r2eff_CR72)
  1200927    5.845    0.000    5.845    0.000 {method 'reduce' of
'numpy.ufunc' objects}
   300000    2.503    0.000    4.842    0.000 chi2.py:32(chi2)
   600300    0.762    0.000    4.705    0.000 fromnumeric.py:1621(sum)
   600300    0.608    0.000    3.460    0.000 _methods.py:23(_sum)
   300300    0.287    0.000    2.191    0.000 fromnumeric.py:2048(amax)
   300300    0.281    0.000    1.990    0.000 fromnumeric.py:2132(amin)
   300300    0.325    0.000    1.904    0.000 _methods.py:15(_amax)
   300300    0.296    0.000    1.709    0.000 _methods.py:19(_amin)




2014-06-10 20:57 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Hi,

I'll have a look tomorrow but, as you've probably seen, some of the
fine details such as indices to be used need to be sorted out when
implementing this.

Regards,

Edward


On 10 June 2014 20:49, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> wrote:

What ever I do, I cannot get this to work?

Can you show an example ?

2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Here is an example of avoiding automatic numpy data structure creation
and then garbage collection:

"""
from numpy import add, ones, zeros

a = zeros((5, 4))
a[1] = 1
a[:,1] = 2

b = ones((5, 4))

add(a, b, a)
print(a)
"""

The result is:

[[ 1.  3.  1.  1.]
 [ 2.  3.  2.  2.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]]

The out argument for numpy.add() is used here to operate in a similar
way to the Python "+=" operation.  But it avoids the temporary numpy
data structures that the Python "+=" operation will create.  This will
save a lot of time in the dispersion code.

Regards,

Edward


On 10 June 2014 15:56, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Troels,

Here is one suggestion, of many that I have, for significantly
improving the speed of the analytic dispersion models in your
'disp_spin_speed' branch.  The speed ups you have currently achieved
for spin clusters are huge and very impressive.  But now that you have
the infrastructure in place, you can advance this much more!

The suggestion has to do with the R20, R20A, and R20B numpy data
structures.  They way they are currently handled is relatively
inefficient, in that they are created de novo for each function call.
This means that memory allocation and Python garbage collection
happens for every single function call - something which should be
avoided at almost all costs.

A better way to do this would be to have a self.R20_struct,
self.R20A_struct, and self.R20B_struct created in __init__(), and then
to pack in the values from the parameter vector into these structures.
You could create a special structure in __init__() for this.  It would
have the dimensions [r20_index][ei][si][mi][oi], where the first
dimension corresponds to the different R20 parameters.  And for each
r20_index element, you would have ones at the [ei][si][mi][oi]
positions where you would like R20 to be, and zeros elsewhere.  The
key is that this is created at the target function start up, and not
for each function call.

This would be combined with the very powerful 'out' argument set to
self.R20_struct with the numpy.add() and numpy.multiply() functions to
prevent all memory allocations and garbage collection.  Masks could be
used, but I think that that would be much slower than having special
numpy structures with ones where R20 should be and zeros elsewhere.
For just creating these structures, looping over a single r20_index
loop and multiplying by the special [r20_index][ei][si][mi][oi]
one/zero structure and using numpy.add() and numpy.multiply() with out
arguments would be much, much faster than masks or the current
R20_axis logic.  It will also simplify the code.

Regards,

Edward


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Header

Content

Related Messages