Hi Troels, For reference for the relax users reading this, the abbreviations Troels used for the relaxation dispersion models can be decoded using the relax wiki page http://wiki.nmr-relax.com/Category:Relaxation_dispersion.
Just a little heads-up. Within a week or two, we should be able to release a new version of relax. This update has focused on speed, recasting the data to higher dimensional numpy arrays, and moving the multiple dot operations out of the for loops. So we have quite much better speed now. :-)
Troels, you just gave away the surprise of the relax 3.2.3 or 3.2.4 release! Note that relax was not particularly slow before these changes, but that you have taken far greater advantage of the BLAS/LAPACK libraries behind the numpy functions to make the dispersion models in relax super fast, especially when spins are clustered. It's a pity we don't have the means to compare the optimisation target function speed between relax and other software to show how much faster relax now is, as the relax advantage of having a grid search to find the initial non-biased position for optimisation (a very important technique for optimisation that a number of other dispersion software do not provide) will give many relax users the incorrect impression that relax is slower than the other software. Though if relax is used on a cluster with OpenMPI, this is a non-issue.
But we still have a bottleneck with numerical models, where doing the matrix exponential via eigenvalue decomposition is slow! Do any of you any experience with a faster method for doing matrix exponential? These initial results shows that if you are going to use the R1rho 2site model, you can expect: -R1rho 100 single spins analysis: DPL94: 23.525+/-0.409 -> 1.138+/-0.035, 20.676x faster. TP02: 20.191+/-0.375 -> 1.238+/-0.020, 16.308x faster. TAP03: 31.993+/-0.235 -> 1.956+/-0.025, 16.353x faster. MP05: 24.892+/-0.354 -> 1.431+/-0.014, 17.395x faster. NS R1rho 2-site: 245.961+/-2.637 -> 36.308+/-0.458, 6.774x faster. Cluster of 100 spins analysis: DPL94: 23.872+/-0.505 -> 0.145+/-0.002, 164.180x faster. TP02: 20.445+/-0.411 -> 0.172+/-0.004, 118.662x faster. TAP03: 32.212+/-0.234 -> 0.294+/-0.002, 109.714x faster. MP05: 25.013+/-0.362 -> 0.188+/-0.003, 132.834x faster. NS R1rho 2-site: 246.024+/-3.724 -> 33.119+/-0.320, 7.428x faster. -CPMG 100 single spins analysis: CR72: 2.615+/-0.018 -> 1.614+/-0.024, 1.621x faster. CR72 full: 2.678+/-0.020 -> 1.839+/-0.165, 1.456x faster. IT99: 1.837+/-0.030 -> 0.881+/-0.010, 2.086x faster. TSMFK01: 1.665+/-0.049 -> 0.742+/-0.007, 2.243x faster. B14: 5.851+/-0.133 -> 3.978+/-0.049, 1.471x faster. B14 full: 5.789+/-0.102 -> 4.065+/-0.059, 1.424x faster. NS CPMG 2-site expanded: 8.225+/-0.196 -> 4.140+/-0.062, 1.987x faster. NS CPMG 2-site 3D: 240.027+/-3.182 -> 45.056+/-0.584, 5.327x faster. NS CPMG 2-site 3D full: 240.910+/-4.882 -> 45.230+/-0.540, 5.326x faster. NS CPMG 2-site star: 186.480+/-2.299 -> 36.400+/-0.397, 5.123x faster. NS CPMG 2-site star full: 187.111+/-2.791 -> 36.745+/-0.689, 5.092x faster. Cluster of 100 spins analysis: CR72: 2.610+/-0.035 -> 0.118+/-0.001, 22.138x faster. CR72 full: 2.674+/-0.021 -> 0.122+/-0.001, 21.882x faster. IT99: 0.018+/-0.000 -> 0.009+/-0.000, 2.044x faster. IT99: Cluster of only 1 spin analysis, since v. 3.2.2 had a bug with clustering analysis.: TSMFK01: 1.691+/-0.091 -> 0.039+/-0.000, 43.369x faster. B14: 5.891+/-0.127 -> 0.523+/-0.004, 11.264x faster. B14 full: 5.818+/-0.127 -> 0.515+/-0.021, 11.295x faster. NS CPMG 2-site expanded: 8.167+/-0.159 -> 0.702+/-0.008, 11.638x faster. NS CPMG 2-site 3D: 238.717+/-3.025 -> 41.380+/-0.950, 5.769x faster. NS CPMG 2-site 3D full: 507.411+/-803.089 -> 41.852+/-1.317, 12.124x faster. NS CPMG 2-site star: 187.004+/-1.935 -> 30.823+/-0.371, 6.067x faster. NS CPMG 2-site star full: 187.852+/-2.698 -> 30.882+/-0.671, 6.083x faster.
There are a number of ways of computing the matrix exponential. Avoiding eigenvalue decomposition is the essential key. The Pade approximation is probably the best, followed by one of the Taylor series expansion approximations. As I mentioned in a different post to the relax-devel mailing list, this was used earlier in relax via Scipy. But I removed this and wrote my own algorithm using eigenvalue decomposition as the Scipy expm() function used in relax was buggy and could not correctly handle the complex part of the matrix (for the 'NS CPMG * star' models, http://wiki.nmr-relax.com/NS_CPMG_2-site_star). Expokit might also be useful http://www.maths.uq.edu.au/expokit/. But if you can come up with a multi-rank version of such an algorithm, as well as the matrix power algorithm, taking advantage of BLAS/LAPACK (or writing it in C or FORTRAN and using OpenMP https://en.wikipedia.org/wiki/OpenMP with a Python wrapper interface), then this will be a huge contribution. Eliminating this bottleneck that makes the numeric dispersion models 10 to 500 times slower than the analytic models, as can be seen in the table of speed ups, would be impressive. Note that there is another way to solve this problem which is far faster, and that is what Nikolai Skrynnikov did with the 'NS CPMG 2-site expanded' model in relax (http://www.nmr-relax.com/api/3.2/lib.dispersion.ns_cpmg_2site_expanded-module.html and http://wiki.nmr-relax.com/NS_CPMG_2-site_expanded). This is a numeric model, but it looks rather analytic and it requires no slow matrix exponential or matrix power operations. From the speeds in seconds that you gave, it can be seen that this model is only slightly slower than Andy's analytic B14 model (http://wiki.nmr-relax.com/B14). The reason why this model is now pretty much the same speed as Andy's model are your linear algebra optimisations. And using Nikolai's approach, it is possible to achieve this for all the numeric models in relax. The idea is to take each numeric model and brute-force expand it using Maple (https://en.wikipedia.org/wiki/Maple_%28software%29) or equivalent. Other symbolic algebra software such as the free Maxima (or the GUI wxMaxima) could be used, or Mathematica, though it looks like Maple is the most powerful for outputting this. You can see how this was done in the lib.dispersion.ns_cpmg_2site_expanded module documentation, as the original Maple script is reproduced there together with an explanation of the steps involved (http://www.nmr-relax.com/api/3.2/lib.dispersion.ns_cpmg_2site_expanded-module.html). All numeric models, whereby the evolution matrix is slightly different to account for different effects, has a completely different rank, or is complex, can be expanded like this. Then the incredibly slow matrix exponential and matrix power operations are avoided and all of your linear algebra tricks for speeding up the models by taking advantage of BLAS/LAPACK+OpenMP via numpy can be used. Using this approach, you should be able to speed up all numeric models to the equivalent of Nikolai's 'NS CPMG 2-site expanded' model. This would be the ultimate speed up for the numeric models, and would make relax by far the fastest dispersion software out there. You could then include this point as part of your paper, and have all the equations in a supplement so it can be cited. Regards, Edward