Re: Speed up suggestion for task #7807. -- June 10, 2014

Hi Troels,

My second suggestion is related to the first
(http://thread.gmane.org/gmane.science.nmr.relax.devel/6135).  But it
is for the dw data structure.  Again you can shift most of the work to
the __init__() method.  The changes would be:

- In __init__(), create the self.dw_struct structure with dimensions
[ei][si][mi][oi][di].

- In __init__(), create the self.dw_mask structure with dimensions
[dw_index][ei][si][mi][oi][di].  Here dw_index can be over the spins.
Although repetitive with the si index, this is for speed in the target
function.  Instead of using ones and zeros, as for the R20 suggestion,
the ones could be replaced by the self.frqs[ei][si][mi] values!

- Also in __init__(), create the self.dw_temp structure with
dimensions [dw_index][ei][si][mi][oi][di].  This is for permanent
storage for the numpy.multiply() operations.

- In the target function, loop over dw_index and use
"numpy.multiply(dw[dw_index], self.dw_mask[dw_index],
self.dw_temp[dw_index])" followed by "numpy.add(self.dw_struct,
self.dw_temp[dw_index], self.dw_struct)" to build up self.dw_struct to
pass into lib.dispersion.

This, combined with the almost identical R20 suggestion, will really
give you an insane speed up.  You have implemented the necessary
infrastructure, so these speeds ups are now relatively easy.

Regards,

Edward




On 10 June 2014 15:56, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Troels,

Here is one suggestion, of many that I have, for significantly
improving the speed of the analytic dispersion models in your
'disp_spin_speed' branch.  The speed ups you have currently achieved
for spin clusters are huge and very impressive.  But now that you have
the infrastructure in place, you can advance this much more!

The suggestion has to do with the R20, R20A, and R20B numpy data
structures.  They way they are currently handled is relatively
inefficient, in that they are created de novo for each function call.
This means that memory allocation and Python garbage collection
happens for every single function call - something which should be
avoided at almost all costs.

A better way to do this would be to have a self.R20_struct,
self.R20A_struct, and self.R20B_struct created in __init__(), and then
to pack in the values from the parameter vector into these structures.
You could create a special structure in __init__() for this.  It would
have the dimensions [r20_index][ei][si][mi][oi], where the first
dimension corresponds to the different R20 parameters.  And for each
r20_index element, you would have ones at the [ei][si][mi][oi]
positions where you would like R20 to be, and zeros elsewhere.  The
key is that this is created at the target function start up, and not
for each function call.

This would be combined with the very powerful 'out' argument set to
self.R20_struct with the numpy.add() and numpy.multiply() functions to
prevent all memory allocations and garbage collection.  Masks could be
used, but I think that that would be much slower than having special
numpy structures with ones where R20 should be and zeros elsewhere.
For just creating these structures, looping over a single r20_index
loop and multiplying by the special [r20_index][ei][si][mi][oi]
one/zero structure and using numpy.add() and numpy.multiply() with out
arguments would be much, much faster than masks or the current
R20_axis logic.  It will also simplify the code.

Regards,

Edward

Re: Speed up suggestion for task #7807.

Header

Content

Related Messages