Re: how to parallelise model_free minimise -- March 26, 2007

On 3/24/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Dear Ed and all
    I have started looking at how to parallelise the calls to
model_free.minimise as discussed in our previous message but am having
some problems with the function....

The first is that it is huge and seems to have a large amount of special
casing and checking built in.


Yep, it is quite complex.  However this code complexity simplifies the
execution of model-free minimisations and grid searches.

The second one is to work out how many different modes it can be called
in. From waht I can see I need to look for param_set and only
paralellise if self.param_set is one of mf or local_mf (diff and alle
being either to trivial or too hard to optimise respectivley ;-)


There is no need as this param_set is handled by the code to produce
the main loop of that model-free 'minimise()' function.  The part to
parallelise of this function is this main loop over the minimisation
instances.  You can find the loop on line 2118 of
'specific_fns/model_free.py' of the 'multi_processor' branch (or
search for the comment "# Loop over the minimisation instances.").

Then there comes the parameters passed

 first there is Mf  and generic_minimse which seem to take a huge number
orf parameters:

           self.mf = Mf(init_params=self.param_vector,
param_set=self.param_set, diff_type=diff_type,
                         diff_params=diff_params,
scaling_matrix=self.scaling_matrix, num_res=num_res,
                         equations=equations, param_types=param_types,
param_values=param_values,
                         relax_data=relax_data, errors=relax_error,
bond_length=r, csa=csa, num_frq=num_frq,
                         frq=frq, num_ri=num_ri,
remap_table=remap_table, noe_r1_table=noe_r1_table,
                         ri_labels=ri_labels, gx=self.relax.data.gx,
gh=self.relax.data.gh,
                         g_ratio=self.relax.data.g_ratio,
h_bar=self.relax.data.h_bar,
                         mu0=self.relax.data.mu0, num_params=num_params,
vectors=xh_unit_vectors)

and generic

                results = generic_minimise(func=self.mf.func,
dfunc=self.mf.dfunc, d2func=self.mf.d2func, args=(),
                                           x0=self.param_vector,
min_algor=min_algor, min_options=min_options,
                                           func_tol=func_tol,
grad_tol=grad_tol, maxiter=max_iterations,
                                           full_output=1,
print_flag=print_flag, A=A, b=b)


If it is the main loop over the minimisation instances which is
parallelised for MPI, etc., then this code won't need modification.

The questions here are

1. that are all these parameters either I misreading thing or else not
undertanding because i couldn't find definitions for every thing


All of the arguments for the instantiation of the Mf class are setup
at the start of the main minimisation loop, i.e. between lines 2119
and 2324.  This is most of the code of the 'minimise()' function.

2. what is going to change between runs or even over runs of the relax
program.


For each iteration of the main loop, these arguments and parameters will 
change.

clearly some things don't change at all and it could even be asked why
for example h_bar is a parameter to Mf  (ther maybe something deep I am
missing here?)


h_bar in the new 1.3 code need not be sent in.  I have created the
module 'physical_constants' from which it can be imported.  Every
argument to the Mf instantiation, except for h_bar and mu0, will be
different for each minimisation 'instance'.  For example if the
param_set is 'all', then the 'relax_data' argument will be the
relaxation data of all selected spin systems.  If param_set is 'mf',
then 'relax_data' will be the relaxation data of a single spin.

some other things may only change at  specific points in the program For
example the vectors of the molecules should only change when the vecors
function of structure or pdb is called
other things are per residue but which of them???


All but h_bar and mu0.

and other things are differing by type of model and minimisation....


Yep, see above.

as a note typical input parameters for a local tm calculation are
-#-initialise Mf residue - 3 LEU
-#--------------
-#-
-#-init_params [ 1000.]
-#-param_se local_tm
-#-diff_type sphere
-#-diff_params None
-#-scaling_matrix [ [  1.00000000e-12]]
-#-num_res 1
-#-equations ['mf_orig']
-#-param_types [['local_tm']]
-#-param_values None
-#-relax_data [array([  0.8293,  12.85  ,   0.9528,  12.57  ,   0.0983])]
-#-errors [array([ 0.023 ,  0.13  ,  0.0253,  0.171 ,  0.0278])]
-#-bond_length [1.0200000000000001e-10]
-#-csa= [-0.00017199999999999998]
-#-num_frq [2]
-#-frq [[750800000.0, 599.71900000000005]]
-#-num_ri [5]
-#-remap_table [[0, 0, 1, 1, 1]]
-#-noe_r1_table [[None, None, None, None, 2]]
-#-ri_labels [['R1', 'R2', 'R1', 'R2', 'NOE']]
-#-gx -27126000.0
-#-gh 267522212.0
-#-g_ratio -9.862206444
-#-h_bar 1.05457159642e-34
-#-mu0 1.25663706144e-06
-#-num_params [1]
-#-vectors [None]
-#-
-#-generic minimisation residue - 3 LEU
-#-----------------------------
-#-
-#-constraints 1
-#-func <bound method Mf.func_local_tm of <maths_fns.mf.Mf instance at
0x4079fc0c>> -#-dfunc <bound method Mf.dfunc_local_tm of
<maths_fns.mf.Mf instance a
t 0x4079fc0c>>
-#-d2func <bound method Mf.d2func_local_tm of <maths_fns.mf.Mf instance
at 0x4079fc0c>>
-#-args ()
-#-print x0 [ 1000.]
-#-print min_algor Method of Multipliers
-#-min_options ('newton',)
-#-func_tol 1e-25
-#-grad_tol None
-#-maxiter 10000000
-#-full_output 1
-#-print_flag 1
-#-constrained
-#-A [[ 1.]
-#-b [      0. -200000.]

As an aside when the redesign of the spin_loops and minimise /model
loops cuts in it would be a good idea 9from the paralle point of view)
to have the spin loop  running faster than the minimse/model loop


That's guaranteed.  The speed of the spin loop (the spin_loop()
function of the 'generic_fns.selection' module of the 1.3 line) will
be limited by the internal Python looping speed.  The main loop of the
minimise() function is limited by the call to generic_minimise() which
should be many orders of magnitude slower.

so you could split by residue for prallelising but send of all the
required model minimisations for each model at the same time which would
give implicicit load balancing and coarser gains on homogeouds parallel
computers:


The minimise() main loop does all of this.  Splitting by residue only
makes sense for the 'mf' and 'local_tm' param_set values.  All
residues are involved in the diffusion tensor optimisation 'diff' and
the complete optimisation 'all'.

i.e for six residues and 3 nodes
node 1 calculates
res 1 [m1 m2 m3...]
res 2 [[m1 m2 m3..]
node 2 calculates
res 3 [m1 m2 m3...]
res 4 [[m1 m2 m3..]
node 3 calculates
res 5 [m1 m2 m3...]
res 6 [[m1 m2 m3..]
this obviously places some limitations of the design of the minimisation function as it might needs to have a set and tear down region that cope with this batched data....


The minimise() main loop is the finest grain parallelisation you can
get without writing a specific parallelised optimisation algorithm.

Cheers,

Edward

Re: how to parallelise model_free minimise

Header

Content

Related Messages