I really think for my case, that 25 speed up is a deal breaker ! I have so much data to crunch, that 25 speed is absolutely perfect.
Note that with the CR72 model, there are still a number of optimisations which can be performed which will probably give a lot more speed than this 25%. In addition, there is a good chance that the math domain error checking will mean that this 25% speed up might not happen. You may only get 1.05 ;) That really depends on how the checks are constructed. Anyway, you can give it a go if you wish. But try to optimise CR72 first.
I would only optimise this for CR72, and TSMFK01, since these are the ones I need now. And the change of code is only 3-5 lines?
Well, the math domain checking which sets "back_calc[i] = 1e100" or "back_calc[i] = r20_kex" in the CR72 model would have to be rethought. The TSMF01 too. Note though that this changes the logic of the target_functions.relax_disp to lib.dispersion API interface. This is a big deal! The interface must all be one or the other - either all lib.dispersion modules return back-calculated R2eff arrays or the all accept R2eff arrays and pack them with data. There can be no zero compromise with an important API interface such as this. A mixture cannot be accepted. That being said, if you manage the change in one model, then you won't find it difficult to change it in others. Note that there is also another optimisation target which is the 'missing' data structure. A redesign of that might give you a 50% or more speed up as well.
And i was thinking of one thing more. CR72 always go over loop. ----------- # Loop over the time points, back calculating the R2eff values. for i in range(num_points): # The full eta+/- values. etapos = etapos_part / cpmg_frqs[i] etaneg = etaneg_part / cpmg_frqs[i] # Catch large values of etapos going into the cosh function. if etapos > 100: back_calc[i] = 1e100 continue # The arccosh argument - catch invalid values. fact = Dpos * cosh(etapos) - Dneg * cos(etaneg) if fact < 1.0: back_calc[i] = r20_kex continue # The full formula. back_calc[i] = r20_kex - cpmg_frqs[i] * arccosh(fact) ------------ I would rather do: etapos = etapos_part / cpmg_frqs And then check for nan values. If any of these are there, just return the whole array with 1e100, instead of single values. That would replace a loop with a check.
This is a great idea - it'd be awesome if it worked. You should make such optimisations to the CR72 model first. Note though that there will be no NaN values - the cpmg_frqs array should be a reasonable set of values and never contain 0.0 (that should be caught and prevented by the specific_analyses.relax_disp code). You have to check for values > 100 to avoid in numpy.cosh() ("if numpy.max(etapos) > 100:"). You could then also calculate 'fact' outside of the loop. But then check for values < 1.0 ("if numpy.min(fact) < 1.0"), as arccosh is only defined for values >= 1.0. Then calculate "values = r20_kex - cpmg_frqs * arccosh(fact)", again out of the loop. I think you'll be impressed by the speed ups this gives. Then finally have the code: # Loop over the time points, packing the R2eff values into the array. for i in range(num_points): back_calc[i] = values[i] Timings for such a change, possibly from a system test, would be great to have. Once CR72 is fully optimised, then you should consider creating a branch for changing the target_functions.relax_disp to lib.dispersion API. And if you manage to make such optimisation so that all models end in these three lines, then it would be very quick to change the entire API. Regards, Edward