Re: multi processing -- May 05, 2006

On 5/5/06, Andrew Perry <ajperry@xxxxxxxxxxxxxx> wrote:


>>SSH tunnels is probably not the best option for your system.  Do you
>>know anything about MPI?
>
>I have read about MPI but have not implimented anything __YET__;-). >Also
>I have compiled some MPI based programs. It seems to a bit of a pig >and
>I don't think the low hanging fruit necessarily require that degree of
>fine grained distribution...

If this is any help, I've done what I think is some fairly exhaustive
searching for python+mpi implementations recently. Note that I've never
_actually_ used any of them for a project yet.


Thanks, the info should help.

Scientific Python has an MPI interface, which is handy since it is already a
relax dependancy. The drawback is that its documentation seems very geared
toward those who already understand MPI reasonably well. The other drawback
is that is seems to be only able to pass Numpy arrays and strings between
nodes, which would mean some relax data structures would probably need to be
'repackaged' for sending via MPI.
http://starship.python.net/~hinsen/ScientificPython/ScientificPythonManual/Scientific_27.html


If the code is implemented is implemented at the analysis specific
level, for example the minimise() function in the file
'specific_fns/model_free.py', then almost all of the data structures
are already converted to Numeric arrays.

Another one is:
 MYMPI - http://peloton.sdsc.edu/~tkaiser/mympi/ (and
http://grid-devel.sdsc.edu/gridsphere/gridsphere?cid=mympi
) -  syntax intended to match C MPI API closely, and much like
Scientific.MPI only has direct support for some basic data types, not
arbitrary python objects.

Most other implementations (below) support transmission of any python object
that can be pickled, and so may take less code to implement in relax.
However, sending the whole data object when only select parts of it are
required for the calculation could be more inefficient than you would like,
and so 'repackaging' and sending just what is needed may be better anyway. I
wonder which is worse in this case .. the network overhead of sending a
large-ish python object, or the extra load on the 'master' node as it
repackages it to smaller Numpy array ..?? Guess it all depends on whether
things are carved up 'batchwise' or more fine-grained (inner loop/function
level).

What data needs to be sent depends on what level the threading will be implemented on. If each call to minimise() in 'specific_fns/model_free.py' is threaded, then only the data which is packaged within that function will need to be sent. The node can then return solely the minimisation results (parameter vector, iteration count, function count, gradient count, hessian count, and warnings). My threading code is a little higher up in the chain within the minimise() function of the generic code (generic_fns/minimise.py) which calls the specific model-free minimise() function. This code currently only works for Monte Carlo simulations.

The repackaging overhead by the master node should be tiny compared to
the calculation time.  The cost of sending data could become quite
high if the threading is fine grained enough.  What really needs to be
determined is what will be threaded.  Will individual model-free
minimisation instances be threaded?  If the diffusion tensor is fixed
then individual residue minimisations will be threaded.  If the
diffusion tensor parameters are optimised, either with or without the
model-free parameters, then there is one single instance.  If the
local tm parameter is included, then again individual residues are
optimised.  Using this fine grained approach communication to and from
the nodes will likely be expensive.

The second thing which could be threaded is the runs themselves.  For
example if models m1 to m9 are optimised normally using a Python loop
these could be threaded so that, assuming individual residue
minimisations are threaded, then model m2 calculations could start
while instances of model m1 are still being calculated on nodes.  This
could cause significant speed ups if the protein has more residues
than the cluster has nodes.  Otherwise each run could be sent to a
different node (the amount of data sent would be much larger).

Finally Monte Carlo simulations are the highest level and most obvious
target.  This is the part of model-free analysis which takes the
longest.

MMPI - http://www.penzilla.net/mmpi/ - looks to be actively developed, good
documentation with examples, including sending of python objects via
pickling.

 pyPar -
http://datamining.anu.edu.au/~ole/work/software/pypar/ -
sends abitrary python objects, only two GPL licensed files so would be very
easy to package directly with relax rather than make users chase
dependancies.


We could import the dependancy with a 'try:' statement so that MPI is
only a dependency for those wishing to use multiple machines.  It
looks like Pypar is dependent on a C MPI library anyway.

There are also two which are parallel python interpreters that require
recompilation, and seem to work a bit differently (still getting my head
around exactly how these are meant to be used).

 http://www.cimec.org.ar/python/ - a parallel interpreter as well as also
some MPI bindings for python. I tested the interpreter with relax and
LAM/MPI, seemed to spawn off lots of processes and run.

 pyMPI - http://pympi.sourceforge.net/index.html - a
parallel python interpreter, decent docs at (
http://heanet.dl.sourceforge.net/sourceforge/pympi/pyMPI.pdf
), seems mature despite out of data website.

There is also:

MPY - http://mpy.sourceforge.net/index.html (seems
abandoned since 2004)

Hope this helps ...

Andrew


Would you know which of these implementations are the most mature or
the most used?  Stability would be better than fancy features.

Thanks,

Edward

Re: multi processing

Header

Content

Related Messages