Fwd: how to parallelise model_free minimise -- March 27, 2007

On 3/26/07, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:

On 3/27/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:
> Edward d'Auvergne wrote:
> > On 3/24/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

> >> 2. what is going to change between runs or even over runs of the relax
> >> program.
> >
> > For each iteration of the main loop, these arguments and parameters
> > will change.
> >
> Not necessarily? certainly things such as remap_table, ri_labels, etc do
> not seem to change between passes through the loop

These actually change if you have a data set missing for a single spin
system because of peak overlap, etc.  Most of the time you don't see
this though.


Ah now I see! Well the answer here is to preprocess the data in the
master process
minimise  loop and look for shared instances and only send
the shared instances over the wire along with some tokens defining
what to replace with what. Then we put it all back together at the end
(i.e. we unshare things  by replacing tokens with unique instances
(this is a form of the flyweight pattern in one way but not another)

> >> As an aside when the redesign of the spin_loops and minimise /model
> >> loops cuts in it would be a good idea 9from the paralle point of view)
> >> to have the spin loop  running faster than the minimse/model loop
> >
> Sorry I wasn't quite clear here, its not comuptational speed I am
> talking about but the speed of the 'loop counter'

Sorry, I don't quite understand what the speed of a 'loop counter' is.


loop counter: a number that increases linearly each time you make a
new pass through the loop

> e.g.it would be nice to have
>
> for residue in  all residues:
>     for model in models:
>              do_stuff-(tm)
>
>
> as opposed to
>
> for model in models: #currently at the user level
>     for residue in  all residues:
>              do_stuff-(tm)
>
> now that might need something of the form
>
>         # Set the run names (also the names of preset model-free models).
>         if local_tm:
>             self.runs = ['tm0', 'tm1', 'tm2', 'tm3', 'tm4', 'tm5',
> 'tm6', 'tm7', 'tm8', 'tm9']
>         else:
>             self.runs = ['m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7',
> 'm8', 'm9']
>
> run.create_composite('super')
> for name in self.runs:
>
>     run.create(name, 'mf')
>     composite_add('super',name)
>     minimise('newton', run='super')
>
>
> which would minimise all runs in parallel...
>
> and I understand from chris that we are planning to do
>
>
>        # Set the run names (also the names of preset model-free models).
>         if local_tm:
>             self.runs = ['tm0', 'tm1', 'tm2', 'tm3', 'tm4', 'tm5',
> 'tm6', 'tm7', 'tm8', 'tm9']
>         else:
>             self.runs = ['m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7',
> 'm8', 'm9']
>
>
>      minimise('newton', runs=self.runs)
>
>
> which would also work
>
>
> now comes the tricky bit
>
>
> all the minimisations etc would now become rfnctions to setup
> minimsations and say submit them to a queue with a suitable object to
> allow the results to be sorted out later.
>
> then at the end of minimise('newton', runs=self.runs) you would collect
> in all the results from all calculations and complete the calculation so
> we have something like
>
> for residue
>     for run in runs:
>        calculation-instance = setup-calculation(residue,run)
>        queue.submit(calculation-instance)
> while(queue.not_complete()):
>     result.queue.get_result()
>     result.record(self.reax.data)
>
> This will allow the maximum numer of calculations to be conducted in
> parallel and will intrisically load balance as well as we can get

There are a number of very important issues with this approach.  The
most important is that the loop over the data pipes corresponding to
the model-free models (the 'runs') is deliberately not part of the
relax codebase.  In Chris' implementation of the 'runs' argument
(which will need to be renamed) the loop will be at the highest level
of the code so that for the generic_fns.minimise code onwards nothing
changes.  This high level loop would probably be a very difficult
target for MPI as the whole relax data storage object will need to be
sent between nodes.  This multi-megabyte transfer per node, per
calculation is not ideal.

no you wouldn't have to if put the whole thing over the wire as long
as you add calculations to do to a queue at the low level and then
requested the calculations  be completed at the end  of the high level
function. In the end of it the user and program see no difference its
a bit like how an optimising compiler works I guess....

Secondly, and very importantly, relax doesn't loop over residues in
the model-free minimise() function.  relax loops over minimisation
instances.  For the 'mf' and 'local_tm' parameter sets, this is a loop
over the spin systems (i.e. molecules first, residues second, and spin
systems last).  For the 'diff' and 'all' parameters sets the number of
minimisation instances is one and hence the loop runs once and then
that's it.  Looping over these followed by looping over the data pipes
(ex-runs) is insane!  That is essentially first looping over the
finest grained level followed by the coarsest.


I do not quite follow where the insanity comes from ;-)

It is not problem...  What is required is to pass as few chunks of
data with the largest size and best balance of computations over the
wire...  Essentially  I want to (effectively, not literally) build a
list of residues and divide the residues out roughly by processor and
then find all the models required for each residue set them up the
whole set of calculations chunk the whole list by the number of
processors say *3 and then put all these calculations on a queue then
collect the results and put the results where they need to be.
Basically i am saying that in many cases minimisation instances and
runs are disjoint sets and so can be calculated at the same time e.g.
the result of residue3 run tm0 does not affect the result of residue 3
tm1 etc ....


If you target the main loop of the minimise() code, I can guarrantee
you'll get the best usage out of a cluster.  Without specifically
mentioning this main loop, this is the target we have been talking
about throughout this thread.  An added benefit is that the minimise()
code base hardly needs to be changed.

indeed and the same would be true for the slightly more sophisticated
scheme I have just posited. almost all the changes would be at the
minimise level apart from asking the queue to do all the calculations
;-)

regards
gary

Regards,

Edward

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Fwd: how to parallelise model_free minimise

Header

Content

Related Messages