Hi Troels, Please see below:
After the changes to the lib/dispersion/model.py files, I see massive speed-up of the computations. During 2 days, I performed over 600 global fittings for a 68 residue protein, where all residues where clustered.I just did it with 1 cpu. This is really really impressive. I did though also alter how the grid search was performed, pre-setting some of the values from known values referred to in a paper. So I can't really say what has cut the time down.
It looks like the dispersion analysis is faster, but probably by a factor of between 1 to 2 for the different dispersion models. So the huge speed up will likely be due to the cuts in the computationally expensive grid search. That is why there are all those tricks in the auto-analysis - copying parameters from other optimised models, either nested or analytic to numeric, and averaging values for the cluster analysis - which were published as part of the paper on this analysis (http://dx.doi.org/10.1093/bioinformatics/btu166).
But looking at the calculations running, the minimisation runs quite fast. So, how does relax do the collecting of data for global fitting?
You can see this in the docstring for the dispersion target function class. For example for the R2eff/R1rho values: """ @keyword values: The R2eff/R1rho values. The dimensions are {Ei, Si, Mi, Oi, Di}. @type values: rank-4 list of numpy rank-1 float arrays """ So you can see that the second dimension is the spins, i.e. this will range over all spins in the cluster. The target function class is initialised once per cluster (or per free spin), as defined by the model_loop() API method.
Does i collect all the R2eff values for the clustered spins, and sent it to the target function together with the array of parameters to vary?
Yes, but again see the docstring for the fine details. Looking at the code for assembling the R2eff/R1rho data structures will also show this.
Or does it calculate per spin, and share the common parameters?
The chi-squared is calculated per spin with the common parameters and then summed over all spins of the cluster, all within one target function call. That will be the loop over si that you see inside each target function.
My current bottle neck actually seems to be the saving of the state file, between each iteration of global analysis.
You should have a look at what is being saved inside each file. This should only take a few seconds, or tens of seconds if you have Monte Carlo simulations. Which global analysis are you talking about? Your own custom analysis or the auto-analysis? The auto-analysis saves results files rather than state files (result files are simply one data pipe whereas state files are everything in the data store, including non-pipe structures). If it is a custom analysis, then maybe you are not resetting the relax data store between iterations and hence your files will be becoming bigger and bigger with each iteration - i.e. the number of data pipes is increasing each time. Or maybe you should use the results.write rather than state.save user function. Regards, Edward