mailRe: how to parallelise model_free minimise


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on March 26, 2007 - 07:18:
On 3/24/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:
Dear Ed and all
    I have started looking at how to parallelise the calls to
model_free.minimise as discussed in our previous message but am having
some problems with the function....

The first is that it is huge and seems to have a large amount of special
casing and checking built in.

Yep, it is quite complex. However this code complexity simplifies the execution of model-free minimisations and grid searches.


The second one is to work out how many different modes it can be called
in. From waht I can see I need to look for param_set and only
paralellise if self.param_set is one of mf or local_mf (diff and alle
being either to trivial or too hard to optimise respectivley ;-)

There is no need as this param_set is handled by the code to produce the main loop of that model-free 'minimise()' function. The part to parallelise of this function is this main loop over the minimisation instances. You can find the loop on line 2118 of 'specific_fns/model_free.py' of the 'multi_processor' branch (or search for the comment "# Loop over the minimisation instances.").


Then there comes the parameters passed

 first there is Mf  and generic_minimse which seem to take a huge number
orf parameters:

           self.mf = Mf(init_params=self.param_vector,
param_set=self.param_set, diff_type=diff_type,
                         diff_params=diff_params,
scaling_matrix=self.scaling_matrix, num_res=num_res,
                         equations=equations, param_types=param_types,
param_values=param_values,
                         relax_data=relax_data, errors=relax_error,
bond_length=r, csa=csa, num_frq=num_frq,
                         frq=frq, num_ri=num_ri,
remap_table=remap_table, noe_r1_table=noe_r1_table,
                         ri_labels=ri_labels, gx=self.relax.data.gx,
gh=self.relax.data.gh,
                         g_ratio=self.relax.data.g_ratio,
h_bar=self.relax.data.h_bar,
                         mu0=self.relax.data.mu0, num_params=num_params,
vectors=xh_unit_vectors)

and generic

                results = generic_minimise(func=self.mf.func,
dfunc=self.mf.dfunc, d2func=self.mf.d2func, args=(),
                                           x0=self.param_vector,
min_algor=min_algor, min_options=min_options,
                                           func_tol=func_tol,
grad_tol=grad_tol, maxiter=max_iterations,
                                           full_output=1,
print_flag=print_flag, A=A, b=b)

If it is the main loop over the minimisation instances which is parallelised for MPI, etc., then this code won't need modification.

The questions here are

1. that are all these parameters either I misreading thing or else not
undertanding because i couldn't find definitions for every thing

All of the arguments for the instantiation of the Mf class are setup at the start of the main minimisation loop, i.e. between lines 2119 and 2324. This is most of the code of the 'minimise()' function.

2. what is going to change between runs or even over runs of the relax
program.

For each iteration of the main loop, these arguments and parameters will change.


clearly some things don't change at all and it could even be asked why
for example h_bar is a parameter to Mf  (ther maybe something deep I am
missing here?)

h_bar in the new 1.3 code need not be sent in. I have created the module 'physical_constants' from which it can be imported. Every argument to the Mf instantiation, except for h_bar and mu0, will be different for each minimisation 'instance'. For example if the param_set is 'all', then the 'relax_data' argument will be the relaxation data of all selected spin systems. If param_set is 'mf', then 'relax_data' will be the relaxation data of a single spin.


some other things may only change at  specific points in the program For
example the vectors of the molecules should only change when the vecors
function of structure or pdb is called
other things are per residue but which of them???

All but h_bar and mu0.

and other things are differing by type of model and minimisation....

Yep, see above.

as a note typical input parameters for a local tm calculation are
-#-initialise Mf residue - 3 LEU
-#--------------
-#-
-#-init_params [ 1000.]
-#-param_se local_tm
-#-diff_type sphere
-#-diff_params None
-#-scaling_matrix [ [  1.00000000e-12]]
-#-num_res 1
-#-equations ['mf_orig']
-#-param_types [['local_tm']]
-#-param_values None
-#-relax_data [array([  0.8293,  12.85  ,   0.9528,  12.57  ,   0.0983])]
-#-errors [array([ 0.023 ,  0.13  ,  0.0253,  0.171 ,  0.0278])]
-#-bond_length [1.0200000000000001e-10]
-#-csa= [-0.00017199999999999998]
-#-num_frq [2]
-#-frq [[750800000.0, 599.71900000000005]]
-#-num_ri [5]
-#-remap_table [[0, 0, 1, 1, 1]]
-#-noe_r1_table [[None, None, None, None, 2]]
-#-ri_labels [['R1', 'R2', 'R1', 'R2', 'NOE']]
-#-gx -27126000.0
-#-gh 267522212.0
-#-g_ratio -9.862206444
-#-h_bar 1.05457159642e-34
-#-mu0 1.25663706144e-06
-#-num_params [1]
-#-vectors [None]
-#-
-#-generic minimisation residue - 3 LEU
-#-----------------------------
-#-
-#-constraints 1
-#-func <bound method Mf.func_local_tm of <maths_fns.mf.Mf instance at
0x4079fc0c>> -#-dfunc <bound method Mf.dfunc_local_tm of
<maths_fns.mf.Mf instance a
t 0x4079fc0c>>
-#-d2func <bound method Mf.d2func_local_tm of <maths_fns.mf.Mf instance
at 0x4079fc0c>>
-#-args ()
-#-print x0 [ 1000.]
-#-print min_algor Method of Multipliers
-#-min_options ('newton',)
-#-func_tol 1e-25
-#-grad_tol None
-#-maxiter 10000000
-#-full_output 1
-#-print_flag 1
-#-constrained
-#-A [[ 1.]
-#-b [      0. -200000.]



As an aside when the redesign of the spin_loops and minimise /model
loops cuts in it would be a good idea 9from the paralle point of view)
to have the spin loop  running faster than the minimse/model loop

That's guaranteed. The speed of the spin loop (the spin_loop() function of the 'generic_fns.selection' module of the 1.3 line) will be limited by the internal Python looping speed. The main loop of the minimise() function is limited by the call to generic_minimise() which should be many orders of magnitude slower.


so you could split by residue for prallelising but send of all the
required model minimisations for each model at the same time which would
give implicicit load balancing and coarser gains on homogeouds parallel
computers:

The minimise() main loop does all of this. Splitting by residue only makes sense for the 'mf' and 'local_tm' param_set values. All residues are involved in the diffusion tensor optimisation 'diff' and the complete optimisation 'all'.


i.e for six residues and 3 nodes

node 1 calculates
res 1 [m1 m2 m3...]
res 2 [[m1 m2 m3..]

node 2 calculates
res 3 [m1 m2 m3...]
res 4 [[m1 m2 m3..]

node 3 calculates
res 5 [m1 m2 m3...]
res 6 [[m1 m2 m3..]

this obviously places some limitations of the design of the minimisation function as it might needs to have a set and tear down region that cope with this batched data....

The minimise() main loop is the finest grain parallelisation you can get without writing a specific parallelised optimisation algorithm.

Cheers,

Edward



Related Messages


Powered by MHonArc, Updated Mon Mar 26 18:20:40 2007