Re: r3237 - in /branches/multi_processor: multi/mpi4py_processor.py relax -- March 20, 2007

Edward d'Auvergne wrote:

Hi,

Just a quick point, it would be good to either start a new thread for
these types of questions or changing the subject field (unless you use
the Gmail web interface like I am at the moment).  People may miss
important discussions with such scary subject lines!


On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

garyt@xxxxxxxxxxxxxxx wrote:

Dear Ed
d this is a good enough point to tell you how to run things and what to
install for the mpi version i am testing with

I have installed mpipy and lam

Are both, together with mpi4py, essential for MPI operation?

yes lam is an mpi implementation and mpi4py is the interface code between the c world of mpi and the python world

As MPI
is solely for those who are very serious and have access to clusters,

not true! mpi can run over ssh as well. For example lam has an ssh backend and this is what i am using for testing on my computer!

the user should be able handle installing these dependencies (or at
least be able to get someone to install it for them).

lam or mpich is often available in a linux distributions software repository e.g http://rpmfind.net/linux/rpm2html/search.php?query=lam-devel lists rpms for mandriva and a variety of other platforms. I chose lam to play with because it is the best performing library on our cluster, but have also used mpich. mpi4py has instructions on building and seetting up a variety of configurations

lam should be available in your linux distribution ??? ;-) run

I'm using Mandriva so I would assume that it is within it's 'contrib'
repositories.  You are talking about http://www.lam-mpi.org/ aren't
you?  As OpenMPI (http://www.open-mpi.org/) seems to be the future of
that project, wouldn't this be the better option?  It's more likely to
be supported in future linux distros.

see above it shouldn't matter which mpi distribution you choose they all have almost identical apis and mpi4py deals with that

mpi4py came from http://www.python.org/pypi/mpi4py (there is an mpi4py website but it is out of date, however, mpi4py is under recent development)
Do you think mpi4py 0.4 or below will be stable enough? Are there alternatives?

I would use mpi4py 0.4.0rc4 from the cheese shop. This is that I have developed with and it seems to work _well___ <http://cheeseshop.python.org/pypi/mpi4py/0.4.0rc4>

follow the instructions to install mpi4py
create the file  'test_multi1.py'
-------------------8<-----------------------
import multi
cmd  =  multi.mpi4py_processor.Get_name_command()
self.relax.processor.run_command(cmd)
-------------------8<-----------------------
then type lamboot
and to run the test type:
mpirun -np 6 python relax --multi mpi4py test_mult1.py
to get:
                                  relax repository checkout
Protein dynamics by NMR relaxation data analysis
                             Copyright (C) 2001-2006 Edward d'Auvergne
This is free software which you are welcome to modify and redistribute
under the conditions of the
GNU General Public License (GPL).  This program, including all modules,
is licensed under the GPL
and comes with absolutely no warranty.  For details type 'GPL'.
Assistance in using this program
can be accessed by typing 'help'.
script = 'test_mult1.py' ----------------------------------------------------------------------------------------------------
import sys
import multi cmd = multi.mpi4py_processor.Get_name_command() self.relax.processor.run_command(cmd) ----------------------------------------------------------------------------------------------------
1 fbsdpcu156-9377
2 fbsdpcu156-9378
3 fbsdpcu156-9379
4 fbsdpcu156-9380
5 fbsdpcu156-9381
I'll have to play with this tomorrow.
hope this is useful!
I'm starting to get a better idea of how this will be implemented!
Now a question what is the best way to get an etrenally running relax
interpreter I can just fire commands at (for the slaves)?
The prompt based interface (as well as the script interface) is only one way of invoking relax. An important question is how should we present relax to the user when using MPI. Should the parent process present a functional interpreter or should operation be solely dictated by a script?

I already have a prompt running on the master, my idea is that the relax use should see no difference (apart from perfomance) when using he parallel version

Or should a completely different mechanism of
operation be devised for the interface of the parent.  For the grid
computing code the parent UI is either the prompt or the script while
the slaves use the interface started by 'relax --thread'.  The slave
use none of the user functions and only really invoke the number
crunching code.

now this is what I couldn't follow. How much of the relax environment is essential for a slave/thread and if we don't want to do it all by compete pickles of relax data structures (how big is a complete relax data structure for a typical set of runs?) where shoudl i start


For the MPI slaves (I'm assuming these are separate processes with
different PIDs running on different nodes) we should avoid the
standard UI interfaces as these are superfluous.  In this case a
simple MPI interface should probably be devised - it accepts the MPI
commands and returns data back to the parent.  Is this though stdin,
stdout, and stderr in MPI?  My knowledge of MPI is very limited.

1. via picked objects which will integrate the results back into the master copy or 2. as text strings which are printed with a process number in front (so that complete parallel log files can be grepped out of the main output).

I don't currently assume that the compute node has available or useable or shared disk space. everything comes back to the master

the place to look at in mpi4py_processor is

lines 59-71 which sends a command from the master and either prints or exceutes the resulting object (checks for exceptions, repeated feedback, and command completion still to come)

       for i in range(1,MPI.size):
           if i != 0:
               MPI.COMM_WORLD.Send(buf=command,dest=i)
       for i in range(1,MPI.size):
           buf=[]
           if i !=0:
               elem = MPI.COMM_WORLD.Recv(source=i)
               if type(elem) == 'object':
                   elem.run(relax_instance, relax_instance.processor)
               else:
                   #FIXME can't cope with multiple lines
                   print i,elem

and lines 92-94 where all the command are recieved and executes on the slaves while not self.do_quit: command = MPI.COMM_WORLD.Recv(source=0) command.run(self.relax_instance, self.relax_instance.processor)

it is the intention that the command protocol is very minimal

0. send a command plus data to run on the remote slave as an object which will get its run method executed with the local relax and processor as aruments master:communicator.run_command 1. the slave then writes a series of objects to the processsor method return_object 2. master recieves data either a. objects to execute on the master which will also be given the master relax and processor instances b.string objects to print c. a command completion indicator back from the slave (a well known object) d. an exception (raising of a local exception ont master which will do stack trace printing for the master and the slave) e. None a void return

basically I just need to
1. do some setup
2. send a command object
but without quitting at the end of the script

If the prompt and scripting UI interface are not used by the slaves,
this shouldn't be an issue.  The parent should hang and wait at the
'grid_search()' and 'minimise()' user functions until these complete.
No other code needs to be executed by MPI.

that is indeed the case

All that needs to be done is to send the minimal amount of data to the
slave (see the minimise() method of the specific_fns.model_free code
for the objects required in this case), run the specific optimisation
code, and then return the parameter vector and minimisation stats back
to the parent.

do i modify interpreter.run to take a quit varaiable set to False  so
that run_script can be
run with quit = false?

Avoid the interpreter and wait at the optimisation steps - the only
serious number crunching code in relax.

I agree! interpeters are not required on the slave just the relax data structures in a clean and useable state


I hope this helps,

Edward

regards
gary

One other quetion: how well behaved are the relax functions with not 
gratuitously modifying global state. e.g could I share one relax instance 
between several threads? The reason I ask is that in the case that they are 
well behaved  many of the data transfer operations in a threaded environment 
with a single  memory space would become noops ;-) nice!


--
-------------------------------------------------------------------
Dr Gary Thompson
Astbury Centre for Structural Molecular Biology,
University of Leeds, Astbury Building,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt@xxxxxxxxxxxxxxx                   Fax  +44-113-2331407
-------------------------------------------------------------------

Re: r3237 - in /branches/multi_processor: multi/mpi4py_processor.py relax

Header

Content

Related Messages