Re: how to parallelise model_free minimise -- March 27, 2007

On 3/27/07, gary thompson <garyt@xxxxxxxxxxxxxxx> wrote:

On 3/26/07, Edward d'Auvergne <edward@xxxxxxxxxxxxx> wrote:
> On 3/27/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:


[snip]

> > e.g.it would be nice to have
> >
> > for residue in  all residues:
> >     for model in models:
> >              do_stuff-(tm)
> >
> >
> > as opposed to
> >
> > for model in models: #currently at the user level
> >     for residue in  all residues:
> >              do_stuff-(tm)
> >
> > now that might need something of the form
> >
> >         # Set the run names (also the names of preset model-free models).
> >         if local_tm:
> >             self.runs = ['tm0', 'tm1', 'tm2', 'tm3', 'tm4', 'tm5',
> > 'tm6', 'tm7', 'tm8', 'tm9']
> >         else:
> >             self.runs = ['m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7',
> > 'm8', 'm9']
> >
> > run.create_composite('super')
> > for name in self.runs:
> >
> >     run.create(name, 'mf')
> >     composite_add('super',name)
> >     minimise('newton', run='super')
> >
> >
> > which would minimise all runs in parallel...
> >
> > and I understand from chris that we are planning to do
> >
> >
> >        # Set the run names (also the names of preset model-free models).
> >         if local_tm:
> >             self.runs = ['tm0', 'tm1', 'tm2', 'tm3', 'tm4', 'tm5',
> > 'tm6', 'tm7', 'tm8', 'tm9']
> >         else:
> >             self.runs = ['m0', 'm1', 'm2', 'm3', 'm4', 'm5', 'm6', 'm7',
> > 'm8', 'm9']
> >
> >
> >      minimise('newton', runs=self.runs)
> >
> >
> > which would also work
> >
> >
> > now comes the tricky bit
> >
> >
> > all the minimisations etc would now become rfnctions to setup
> > minimsations and say submit them to a queue with a suitable object to
> > allow the results to be sorted out later.
> >
> > then at the end of minimise('newton', runs=self.runs) you would collect
> > in all the results from all calculations and complete the calculation so
> > we have something like
> >
> > for residue
> >     for run in runs:
> >        calculation-instance = setup-calculation(residue,run)
> >        queue.submit(calculation-instance)
> > while(queue.not_complete()):
> >     result.queue.get_result()
> >     result.record(self.reax.data)
> >
> > This will allow the maximum numer of calculations to be conducted in
> > parallel and will intrisically load balance as well as we can get
>
> There are a number of very important issues with this approach.  The
> most important is that the loop over the data pipes corresponding to
> the model-free models (the 'runs') is deliberately not part of the
> relax codebase.  In Chris' implementation of the 'runs' argument
> (which will need to be renamed) the loop will be at the highest level
> of the code so that for the generic_fns.minimise code onwards nothing
> changes.  This high level loop would probably be a very difficult
> target for MPI as the whole relax data storage object will need to be
> sent between nodes.  This multi-megabyte transfer per node, per
> calculation is not ideal.
>
no you wouldn't have to if put the whole thing over the wire as long
as you add calculations to do to a queue at the low level and then
requested the calculations  be completed at the end  of the high level
function. In the end of it the user and program see no difference its
a bit like how an optimising compiler works I guess....


I'm not talking about your suggested implementation but Chris'
implementation (the runs argument) which we have already decided upon.
Your suggestion affects this decision (as well as the whole relax UI,
I'll get to this later).

> Secondly, and very importantly, relax doesn't loop over residues in
> the model-free minimise() function.  relax loops over minimisation
> instances.  For the 'mf' and 'local_tm' parameter sets, this is a loop
> over the spin systems (i.e. molecules first, residues second, and spin
> systems last).  For the 'diff' and 'all' parameters sets the number of
> minimisation instances is one and hence the loop runs once and then
> that's it.  Looping over these followed by looping over the data pipes
> (ex-runs) is insane!  That is essentially first looping over the
> finest grained level followed by the coarsest.

I do not quite follow where the insanity comes from ;-)

It is not problem...  What is required is to pass as few chunks of
data with the largest size and best balance of computations over the
wire...  Essentially  I want to (effectively, not literally) build a
list of residues and divide the residues out roughly by processor and
then find all the models required for each residue set them up the
whole set of calculations chunk the whole list by the number of
processors say *3 and then put all these calculations on a queue then
collect the results and put the results where they need to be.
Basically i am saying that in many cases minimisation instances and
runs are disjoint sets and so can be calculated at the same time e.g.
the result of residue3 run tm0 does not affect the result of residue 3
tm1 etc ....


The insanity is from the fact that the suggestion of the looping over
residues first and then looping over the data pipes breaks the most
fundamental premise of the relax UI (user interface) design - the data
pipes and how the user interacts with them.  I cannot stress how bad
this is!

It's important to note the difference in terminology we are using.
Rather than 'residue' I will use 'spin system'.  In the suggestion you
are only looking at the 'mf' and 'local_tm' parameter sets and hence
instead of 'looping over residues' I will use 'main loop' of the
model-free minimise() function (and 'minimisation instances' instead
of 'individual residue').  Importantly that allows for the 'all' and
'diff' parameter sets (or mathematical model).

There is no computational benefit from looping over minimisation
instances first and then the data pipes (when compared to looping over
the data pipes followed by looping over the minimisation instances).
There may be slight inter-node communication benefits for MPI by not
having to send the same spin system specific data repeatedly for each
model-free model.  Actually I'll take that back because the user is
free to have non-identical data for the same spin system in different
data pipes.  I cannot see communication as a factor which will affect
the speed of the cluster.  The minimisation itself is the limiting
factor.  In the case of the very simple model-free models in which the
calculations are fast, the communication between nodes will be the
limiting factor.  But these are so fast that the calculations and
communications will be over before you know it.  Compared to the
complex double motion model-free models, optimising MPI for these
models is a waste of time.

All of these problems go away if the parallelisation of the
minimisation of model-free models does not occur at this point.  I see
no benefit in the parallelisation of the optimisation of the
model-free models in the 'mf' and 'local_tm' parameter sets.  This is
because if the main loop over the minimisation instances is targeted,
you will almost never have idle nodes during optimisation.

As for sending a number of spin systems to a single node to execute a
couple of optimisations, how would this be of benefit?  The inter-node
communication overhead will still be the same.  The optimisations will
occur in series and hence I can't see a computational advantage.

I really see fundamental problems with the integration of the design
of this proposal and the design of relax.  If the looping over data
pipes at the lowest level is removed, then I am happy with the design.
There are many cases where there is only one data pipe optimisation
where this loop is not used.  I do however question the computational
or communicational benefits of sending data for a number of spin
systems simultaneously to a single node.  I am going to put my foot
down and say that I will not accept this reversal of the core relax UI
design into the main line.

Regards,

Edward

Re: how to parallelise model_free minimise

Header

Content

Related Messages