Re: Speed up suggestion for task #7807. -- June 10, 2014

Note the simplicity of the target function, with one loop over the
spins and one numpy.multiply() and one numpy.add() call for each loop,
both using reusable storage.  For R20 the loop would be over
R20_index, which will be a bit more complicated to set up, but the
data structure build up idea will be the same.  This will give you the
speed while also removing almost all overhead costs.

Regards,

Edward

On 10 June 2014 21:31, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Troels,

No need for an example.  Here is the code to add to your
infrastructure which will make the analytic dispersion models insanely
fast:


"""
from numpy import add, array, float64, multiply, ones, zeros

# Init mimic.
#############

# Values from Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.
NE = 1
NS = 2
NM = 2
NO = 1
ND = 8
dw = array([ 1.847792726895652,  0.193719379085542])
frqs = [-382.188861036982701, -318.479128911056137]
shape = (NE, NS, NM, NO, ND)

# Final structure for lib.dispersion.
dw_struct = zeros(shape, float64)

# Temporary storage to avoid memory allocations and garbage collection.
dw_temp = zeros((NS,) + shape, float64)

# The structure for multiplication with dw to piecewise build up the
full dw structure.
dw_mask = zeros((NS,) + shape, float64)
for si in range(NS):
    for mi in range(NM):
        dw_mask[si, :, si, mi] = frqs[mi]
print(dw_mask)

# Values to be found (again taken directly from
Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster - as a
printout of dw_frq_a).
dw_final = array([[[[[-706.205797724669765, -706.205797724669765,
                      -706.205797724669765, -706.205797724669765,
                      -706.205797724669765, -706.205797724669765,
                      -706.205797724669765, -706.205797724669765]],

                    [[-588.483418069912318, -588.483418069912318,
                      -588.483418069912318, -588.483418069912318,
                      -588.483418069912318, -588.483418069912318,
                      -588.483418069912318, -588.483418069912318]]],


                   [[[ -74.03738885349469 ,  -74.03738885349469 ,
                       -74.03738885349469 ,  -74.03738885349469 ,
                       -74.03738885349469 ,  -74.03738885349469 ,
                       -74.03738885349469 ,  -74.03738885349469 ]],

                    [[ -61.69557910435401 ,  -61.69557910435401 ,
                       -61.69557910435401 ,  -61.69557910435401 ,
                       -61.69557910435401 ,  -61.69557910435401 ,
                       -61.69557910435401 ,  -61.69557910435401 ]]]]])


# Target function.
##################

# Loop over the dw elements (one per spin).
for si in range(NS):
    # First multiply the spin specific dw with the spin specific
frequency mask, using temporary storage.
    multiply(dw[si], dw_mask[si], dw_temp[si])

    # The add to the total.
    add(dw_struct, dw_temp[si], dw_struct)

# Show that the structure is reproduced perfectly.
print(dw_struct - dw_final)
"""

As mentioned in the comments, the structures come from the
Relax_disp.test_cpmg_synthetic_ns3d_to_cr72_noise_cluster.  I just
added a check of "if len(dw) > 1: asdfasd" to kill the test, and added
printouts to obtain dw, frq_a, dw_frq_a, etc.  This is exactly the
implementation I described.  Although there might be an even faster
way, this will eliminate all numpy array creation and deletion via
Python garbage collection in the target functions (when used for R20
as well).

Regards,

Edward

On 10 June 2014 21:09, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

If you have a really complicated example of your current 'dw_frq_a'
data structure for multiple spins and multiple fields, that could help
to construct an example.

Cheers,

Edward



On 10 June 2014 20:57, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi,

I'll have a look tomorrow but, as you've probably seen, some of the
fine details such as indices to be used need to be sorted out when
implementing this.

Regards,

Edward


On 10 June 2014 20:49, Troels Emtekær Linnet <tlinnet@xxxxxxxxxxxxx> 
wrote:

What ever I do, I cannot get this to work?

Can you show an example ?

2014-06-10 16:29 GMT+02:00 Edward d'Auvergne <edward@xxxxxxxxxxxxx>:

Here is an example of avoiding automatic numpy data structure creation
and then garbage collection:

"""
from numpy import add, ones, zeros

a = zeros((5, 4))
a[1] = 1
a[:,1] = 2

b = ones((5, 4))

add(a, b, a)
print(a)
"""

The result is:

[[ 1.  3.  1.  1.]
 [ 2.  3.  2.  2.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]
 [ 1.  3.  1.  1.]]

The out argument for numpy.add() is used here to operate in a similar
way to the Python "+=" operation.  But it avoids the temporary numpy
data structures that the Python "+=" operation will create.  This will
save a lot of time in the dispersion code.

Regards,

Edward


On 10 June 2014 15:56, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Troels,

Here is one suggestion, of many that I have, for significantly
improving the speed of the analytic dispersion models in your
'disp_spin_speed' branch.  The speed ups you have currently achieved
for spin clusters are huge and very impressive.  But now that you have
the infrastructure in place, you can advance this much more!

The suggestion has to do with the R20, R20A, and R20B numpy data
structures.  They way they are currently handled is relatively
inefficient, in that they are created de novo for each function call.
This means that memory allocation and Python garbage collection
happens for every single function call - something which should be
avoided at almost all costs.

A better way to do this would be to have a self.R20_struct,
self.R20A_struct, and self.R20B_struct created in __init__(), and then
to pack in the values from the parameter vector into these structures.
You could create a special structure in __init__() for this.  It would
have the dimensions [r20_index][ei][si][mi][oi], where the first
dimension corresponds to the different R20 parameters.  And for each
r20_index element, you would have ones at the [ei][si][mi][oi]
positions where you would like R20 to be, and zeros elsewhere.  The
key is that this is created at the target function start up, and not
for each function call.

This would be combined with the very powerful 'out' argument set to
self.R20_struct with the numpy.add() and numpy.multiply() functions to
prevent all memory allocations and garbage collection.  Masks could be
used, but I think that that would be much slower than having special
numpy structures with ones where R20 should be and zeros elsewhere.
For just creating these structures, looping over a single r20_index
loop and multiplying by the special [r20_index][ei][si][mi][oi]
one/zero structure and using numpy.add() and numpy.multiply() with out
arguments would be much, much faster than masks or the current
R20_axis logic.  It will also simplify the code.

Regards,

Edward


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: Speed up suggestion for task #7807.

Header

Content

Related Messages