Re: r24294 - in /branches/disp_spin_speed: lib/dispersion/ns_r1rho_2site.py target_functions/relax_disp.py -- June 25, 2014

Well, looking at the Scipy source code of ./scipy/linalg/matfuncs.py,
expm2() and expm3() are implemented in Python code, interfacing to
other scipy functions.  And expm() is an alias for the function of the
same name in ./scipy/sparse/linalg/matfuncs.py, which is also
implemented in Python code.  So there is no BLAS/LAPACK magic
happening here, except maybe in the functions that these expm*()
functions call.

Regards,

Edward



On 25 June 2014 17:54, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Hi Troels,

If you are interested in eliminating this 86% bottleneck in the
numeric R1rho dispersion models in the
lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x()
function, it might be worth looking at the scipy.linalg.expm(),
scipy.linalg.expm2(), and scipy.linalg.expm3() functions
(http://docs.scipy.org/doc/scipy/reference/tutorial/linalg.html#exponential-and-logarithm-functions),
or the BLAS/LAPACK functions these interface to.  Note that I
eliminated the use of these functions in the dispersion analysis a
while ago (http://article.gmane.org/gmane.science.nmr.relax.scm/18869/).
And for good reason, there was a nasty Scipy bug in the Pade
approximation of scipy.linalg.expm().  This and the fatal bugs in all
optimisation algorithms in Scipy (the catalyst for me to create minfx,
originally as part of relax, https://gna.org/projects/minfx/), is the
reason why I try to avoid Scipy as much as possible - I just cannot
trust it any more.  Scipy is a collection of poorly integrated tools,
often with the original authors completely disappearing, and it is
full of bugs.  Anyway, maybe one of these functions would be of use or
would be a good inspiration.  Oh, the scipy.linalg.expm() function in
the original code comes from the fitting_main_kex.py script from
Mathilde Lescanne, Paul Schanda, and Dominique Marion.  Note that if
the Taylor series approximation of scipy.linalg.expm3() works well for
the numeric dispersion models, which is untested, that might be by far
the fastest solution.

Regards,

Edward


On 25 June 2014 08:20, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

Wow, you managed to work out how to remove the slow Python looping
from the numeric models!  For the record, this is the speed up for
this 'NS R1rho 2-site' model using Python 2.7.6 & numpy 1.8.0 on a
64-bit Linux system:

"""
$ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2
/data/relax/branches/disp_spin_speed
[snip]

New relax version:  relax repository checkout r24308
svn+ssh://bugman@xxxxxxxxxxx/svn/relax/branches/disp_spin_speed
Old relax version:  relax repository checkout r24274
svn+ssh://bugman@xxxxxxxxxxx/svn/relax/branches/disp_spin_speed

Execution iteration 1

$ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2
     1000    0.016    0.000    3.561    0.004
relax_disp.py:1582(func_ns_r1rho_2site)
       10    0.013    0.001    3.237    0.324
relax_disp.py:1582(func_ns_r1rho_2site)
$ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed
     1000    0.017    0.000    6.245    0.006
relax_disp.py:1552(func_ns_r1rho_2site)
       10    0.013    0.001    5.899    0.590
relax_disp.py:1552(func_ns_r1rho_2site)
[snip]

100 single spins analysis:
NS R1rho 2-site:            62.450+/-0.000 ->  35.610+/-0.000,   1.754x 
faster.

Cluster of 100 spins analysis:
NS R1rho 2-site:            58.990+/-0.000 ->  32.370+/-0.000,   1.822x 
faster.
"""


This is quite significant for these models.  With all the changes
since relax 3.2.2:

"""
$ ./disp_profile_all.py /data/relax/branches/disp_spin_speed2
/data/relax/tags/3.2.2/
[snip]

New relax version:  relax repository checkout r24308
svn+ssh://bugman@xxxxxxxxxxx/svn/relax/branches/disp_spin_speed
Old relax version:  relax 3.2.2

Execution iteration 1

$ python profiling_ns_r1rho_2site.py /data/relax/branches/disp_spin_speed2
     1000    0.016    0.000    3.576    0.004
relax_disp.py:1582(func_ns_r1rho_2site)
       10    0.013    0.001    3.247    0.325
relax_disp.py:1582(func_ns_r1rho_2site)
$ python profiling_ns_r1rho_2site.py /data/relax/tags/3.2.2/
     1000    0.245    0.000   24.323    0.024
relax_disp.py:1681(func_ns_r1rho_2site)
       10    0.266    0.027   24.439    2.444
relax_disp.py:1681(func_ns_r1rho_2site)
[snip]

100 single spins analysis:
NS R1rho 2-site:           243.230+/-0.000 ->  35.760+/-0.000,   6.802x 
faster.

Cluster of 100 spins analysis:
NS R1rho 2-site:           244.390+/-0.000 ->  32.470+/-0.000,   7.527x 
faster.
"""


Considering how slow these models are compared to the analytic models,
this is a huge improvement!  According to your
profiling_ns_r1rho_2site.py script, the slow parts are:

"""
('chi2 cluster:', 14153134802.76194)
Wed Jun 25 07:59:18 2014    /tmp/tmp17lqUn

         211077 function calls in 3.510 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.510    3.510 <string>:1(<module>)
        1    0.002    0.002    3.510    3.510
profiling_ns_r1rho_2site.py:553(cluster)
       10    0.000    0.000    3.254    0.325
profiling_ns_r1rho_2site.py:515(calc)
       10    0.015    0.002    3.254    0.325
relax_disp.py:1582(func_ns_r1rho_2site)
       10    0.015    0.002    3.236    0.324
ns_r1rho_2site.py:190(ns_r1rho_2site)
       10    0.104    0.010    3.041    0.304
matrix_exponential.py:33(matrix_exponential_rank_NE_NS_NM_NO_ND_x_x)
       10    1.803    0.180    1.827    0.183 linalg.py:982(eig)
       10    0.564    0.056    0.601    0.060 linalg.py:455(inv)
       40    0.501    0.013    0.501    0.013 
{numpy.core.multiarray.einsum}
"""


The single spin numbers are very similar.  Looking at the total times,
the numpy.einsum and numpy.inv functions are taking 1/7 of the time
each, and numpy.eig 51% of the time - this is a total of 80% of the
time.  The 
lib.dispersion.matrix_exponential.matrix_exponential_rank_NE_NS_NM_NO_ND_x_x()
function now takes up 86% of the cumulative time - so this is the new
bottleneck.  The only way to solve this would be using clever linear
algebra tricks to simplify or to change the logic in
matrix_exponential_rank_NE_NS_NM_NO_ND_x_x().  There are multiple
techniques for finding the matrix exponential, it could be that the
eigenvalue method is one of the slowest.  This looks to be interesting
for an overview of recent research in the mathematics field on this
subject: http://www.maths.manchester.ac.uk/~higham/talks/exp09.pdf.
Great work though, you have eliminated the worst bottleneck for the
numeric models!

Cheers,

Edward



On 25 June 2014 02:14,  <tlinnet@xxxxxxxxxxxxx> wrote:

Author: tlinnet
Date: Wed Jun 25 02:14:36 2014
New Revision: 24294

URL: http://svn.gna.org/viewcvs/relax?rev=24294&view=rev
Log:
Speeded up the code of NS r1rho 2site.

This was essential done to numpy einsum, and doing the dot operations in 
multiple dimensions.
It was though necessary to realize, that to do the proper dot product 
operations, the outer two
axis if M0 should be swapped, by rolling the outer axis one back.

Task #7807 (https://gna.org/task/index.php?7807): Speed-up of dispersion 
models for Clustered analysis.

Modified:
    branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py
    branches/disp_spin_speed/target_functions/relax_disp.py

Modified: branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py
URL: 
http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py?rev=24294&r1=24293&r2=24294&view=diff
==============================================================================
--- branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py   (original)
+++ branches/disp_spin_speed/lib/dispersion/ns_r1rho_2site.py   Wed Jun 
25 02:14:36 2014
@@ -50,10 +50,10 @@
 """

 # Python module imports.
-from numpy import array, cos, dot, einsum, float64, log, multiply, 
rollaxis, sin, sum
+from numpy import array, einsum, float64, isfinite, log, min, multiply, 
rollaxis, sin, sum
+from numpy.ma import fix_invalid, masked_less

 # relax module imports.
-from lib.float import isNaN
 from lib.dispersion.matrix_exponential import 
matrix_exponential_rank_NE_NS_NM_NO_ND_x_x

 # Repetitive calculations (to speed up calculations).
@@ -187,14 +187,16 @@
     return matrix


-def ns_r1rho_2site(M0=None, r1rho_prime=None, omega=None, offset=None, 
r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, 
relax_time=None, inv_relax_time=None, back_calc=None, num_points=None):
+def ns_r1rho_2site(M0=None, M0_T=None, r1rho_prime=None, omega=None, 
offset=None, r1=0.0, pA=None, dw=None, kex=None, spin_lock_fields=None, 
relax_time=None, inv_relax_time=None, back_calc=None, num_points=None):
     """The 2-site numerical solution to the Bloch-McConnell equation for 
R1rho data.

     This function calculates and stores the R1rho values.


     @keyword M0:                This is a vector that contains the 
initial magnetizations corresponding to the A and B state transverse 
magnetizations.
-    @type M0:                   numpy float64, rank-1, 7D array
+    @type M0:                   numpy float array of rank 
[NE][NS][NM][NO][ND][6][1]
+    @keyword M0_T:              This is a vector that contains the 
initial magnetizations corresponding to the A and B state transverse 
magnetizations, where the outer two axis has been swapped for efficient 
dot operations.
+    @type M0_T:                 numpy float array of rank 
[NE][NS][NM][NO][ND][1][6]
     @keyword r1rho_prime:       The R1rho_prime parameter value (R1rho 
with no exchange).
     @type r1rho_prime:          numpy float array of rank 
[NE][NS][NM][NO][ND]
     @keyword omega:             The chemical shift for the spin in rad/s.
@@ -238,43 +240,22 @@
     # Magnetization evolution.
     Rexpo_M0_mat = einsum('...ij,...jk', Rexpo_mat, M0)

-    # Transpose M0, to prepare for dot operation. Roll the last axis one 
back.
-    M0_T = rollaxis(M0, 6, 5)
-
-    if NS == 1:
-        # Loop over the spectrometer frequencies.
-        for mi in range(NM):
-            # Loop over offsets:
-            for oi in range(NO):
-                # Extract number of points.
-                num_points_i = num_points[0, 0, mi, oi]
-
-                # Loop over the time points, back calculating the R2eff 
values.
-                for j in range(num_points_i):
-
-                    # If the number of spins are 1, do the fastest 
method by dot product.  Else do it by einstein summation.
-                    # Set the spin index for one spin.
-                    si = 0
-                    # Extract the preformed matrix that rotate the 
magnetization previous to spin-lock into the weff frame.
-                    M0_i= M0_T[0, si, mi, oi, j]
-
-                    # Extract from the pre-formed Magnetization 
evolution matrix.
-                    Rexpo_M0_mat_i = Rexpo_M0_mat[0, si, mi, oi, j]
-
-                    # Magnetization evolution.
-                    MA = dot(M0_i, Rexpo_M0_mat_i)[0, 0]
-
-                    # The next lines calculate the R1rho using a 
two-point approximation, i.e. assuming that the decay is mono-exponential.
-                    if MA <= 0.0 or isNaN(MA):
-                        back_calc[0, si, mi, oi, j] = 1e99
-                    else:
-                        back_calc[0, si, mi, oi, j]= -inv_relax_time[0, 
si, mi, oi, j] * log(MA)
-
-    else:
-        # Magnetization evolution, which include all dimensions.
-        MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, 
:, 0, 0]
-
-        # Do back calculation.
-        back_calc[:] = -inv_relax_time * log(MA_mat)
-
-
+    # Magnetization evolution, which include all dimensions.
+    MA_mat = einsum('...ij,...jk', M0_T, Rexpo_M0_mat)[:, :, :, :, :, 0, 
0]
+
+    # Insert safe checks.
+    if min(MA_mat) < 0.0:
+        mask_min_MA_mat = masked_less(MA_mat, 0.0)
+        # Fill with high values.
+        MA_mat[mask_min_MA_mat.mask] = 1e100
+
+    # Do back calculation.
+    back_calc[:] = -inv_relax_time * log(MA_mat)
+
+    # Catch errors, taking a sum over array is the fastest way to check 
for
+    # +/- inf (infinity) and nan (not a number).
+    if not isfinite(sum(back_calc)):
+        # Replaces nan, inf, etc. with fill value.
+        fix_invalid(back_calc, copy=False, fill_value=1e100)
+
+

Modified: branches/disp_spin_speed/target_functions/relax_disp.py
URL: 
http://svn.gna.org/viewcvs/relax/branches/disp_spin_speed/target_functions/relax_disp.py?rev=24294&r1=24293&r2=24294&view=diff
==============================================================================
--- branches/disp_spin_speed/target_functions/relax_disp.py     (original)
+++ branches/disp_spin_speed/target_functions/relax_disp.py     Wed Jun 
25 02:14:36 2014
@@ -26,8 +26,7 @@

 # Python module imports.
 from copy import deepcopy
-from math import pi
-from numpy import add, array, arctan2, asarray, cos, complex64, dot, 
float64, int16, max, multiply, ones, sin, sqrt, sum, tile, zeros
+from numpy import arctan2, cos, dot, float64, int16, multiply, ones, 
rollaxis, pi, sin, sum, zeros
 from numpy.ma import masked_equal

 # relax module imports.
@@ -415,6 +414,9 @@
             M0_2[2, 0] = 1
             M0_cos = multiply.outer( cos(theta_mat), M0_2 )
             self.M0 = M0_sin + M0_cos
+            # Transpose M0, to prepare for dot operation. Roll the last 
axis one back, corresponds to a transpose for the outer two axis.
+            self.M0_T = rollaxis(self.M0, 6, 5)
+
         if model in [MODEL_NS_R1RHO_3SITE, MODEL_NS_R1RHO_3SITE_LINEAR]:
             self.M0 = zeros(9, float64)
             # Offset of spin-lock from A.
@@ -1598,7 +1600,7 @@
         self.r20_struct[:] = multiply.outer( 
r1rho_prime.reshape(self.NE, self.NS, self.NM), self.no_nd_ones )

         # Back calculate the R2eff values.
-        ns_r1rho_2site(M0=self.M0, r1rho_prime=self.r20_struct, 
omega=self.chemical_shifts, offset=self.offset, r1=self.r1, pA=pA, 
dw=self.dw_struct, kex=kex, spin_lock_fields=self.spin_lock_omega1, 
relax_time=self.relax_times, inv_relax_time=self.inv_relax_times, 
back_calc=self.back_calc, num_points=self.num_disp_points)
+        ns_r1rho_2site(M0=self.M0, M0_T=self.M0_T, 
r1rho_prime=self.r20_struct, omega=self.chemical_shifts, 
offset=self.offset, r1=self.r1, pA=pA, dw=self.dw_struct, kex=kex, 
spin_lock_fields=self.spin_lock_omega1, relax_time=self.relax_times, 
inv_relax_time=self.inv_relax_times, back_calc=self.back_calc, 
num_points=self.num_disp_points)

         # Clean the data for all values, which is left over at the end 
of arrays.
         self.back_calc = self.back_calc*self.disp_struct


_______________________________________________
relax (http://www.nmr-relax.com)

This is the relax-commits mailing list
relax-commits@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-commits

Re: r24294 - in /branches/disp_spin_speed: lib/dispersion/ns_r1rho_2site.py target_functions/relax_disp.py

Header

Content

Related Messages