Re: r3237 - in /branches/multi_processor: multi/mpi4py_processor.py relax -- March 20, 2007

On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:

> Hi,
>
> Just a quick point, it would be good to either start a new thread for
> these types of questions or changing the subject field (unless you use
> the Gmail web interface like I am at the moment).  People may miss
> important discussions with such scary subject lines!
>
>
> On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:
>
>> garyt@xxxxxxxxxxxxxxx wrote:


[snip]

>> Now a question what is the best way to get an etrenally running relax
>> interpreter I can just fire commands at (for the slaves)?
>
>
> The prompt based interface (as well as the script interface) is only
> one way of invoking relax.  An important question is how should we
> present relax to the user when using MPI.  Should the parent process
> present a functional interpreter or should operation be solely
> dictated by a script?


I already have a prompt running on the master, my idea is that the relax
use should see no difference (apart from perfomance) when using he
parallel version


I've just played around with that and it does look like the best
option for user flexibility.

> Or should a completely different mechanism of
> operation be devised for the interface of the parent.  For the grid
> computing code the parent UI is either the prompt or the script while
> the slaves use the interface started by 'relax --thread'.  The slave
> use none of the user functions and only really invoke the number
> crunching code.

now this is what I couldn't follow. How much of the relax environment is
essential for a slave/thread and if we don't want to do it all by
compete pickles of relax data structures (how big is a complete relax
data structure for a typical set of runs?) where shoudl i start


My post at https://mail.gna.org/public/relax-devel/2007-03/msg00097.html
(Message-id: <1174384147.29205.20.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>)
hopefully answers these questions.

> For the MPI slaves (I'm assuming these are separate processes with
> different PIDs running on different nodes) we should avoid the
> standard UI interfaces as these are superfluous.  In this case a
> simple MPI interface should probably be devised - it accepts the MPI
> commands and returns data back to the parent.  Is this though stdin,
> stdout, and stderr in MPI?  My knowledge of MPI is very limited.
>


1. via picked objects  which will integrate the results back into the
master copy or


If implemented at the 'minimise_mpi()' model-free method, then this
would probably be the best option.

2. as text strings which are printed with a  process number in front (so
that  complete parallel  log files can be grepped out of the main output).


However if you see advantages with this option, then maybe this is better.

 I don't currently assume that the compute node has available or useable
or shared  disk space.  everything comes back to the master


I expected MPI to operate in this way.  This means that none of the
grid computing code will be of use to you.

the place to look at in mpi4py_processor is

lines 59-71  which sends a command from the master and either prints or
exceutes the resulting object (checks for exceptions, repeated feedback,
and command completion still to come)

        for i in range(1,MPI.size):
            if i != 0:
                MPI.COMM_WORLD.Send(buf=command,dest=i)
        for i in range(1,MPI.size):
            buf=[]
            if i !=0:
                elem = MPI.COMM_WORLD.Recv(source=i)
                if type(elem) == 'object':
                    elem.run(relax_instance, relax_instance.processor)
                else:
                    #FIXME can't cope with multiple lines
                    print i,elem

and lines 92-94 where all the command are recieved and executes on the
slaves

         while not self.do_quit:
                command = MPI.COMM_WORLD.Recv(source=0)
                command.run(self.relax_instance,
self.relax_instance.processor)

it is the intention that the command protocol is very minimal

0. send a command plus data to run on the remote slave as an object
which will get its run method executed with the local relax and
processor as aruments master:communicator.run_command


Maybe the run method should be part of the 'minimise()' model-free method?

1. the slave then writes a series of objects to the processsor method
return_object


The model-free parameter vector and optimisation stats?

2. master recieves data either
   a. objects to execute on the master which will also be given the
master relax and processor instances


Unpack the minimisation data and place it into the appropriate
location in the relax data storage object?

b.string objects to print


The optimisation print outs?

   c. a command completion indicator back from the slave (a well known
object)


Does the slave then die?

   d. an exception (raising of a local exception ont master which will
do stack trace printing for the master and the slave)


Ah, I didn't think of that one!

e. None a void return

???

>> do i modify interpreter.run to take a quit varaiable set to False  so
>> that run_script can be
>> run with quit = false?
>
>
> Avoid the interpreter and wait at the optimisation steps - the only
> serious number crunching code in relax.


I agree!  interpeters are not required on the slave just the relax data
structures in a clean and useable state


If you're working at the level of the model-free 'minimise()'
function, don't bother with the relax data structures!  See my
previous post mentioned above.

One other quetion: how well behaved are the relax functions with not gratuitously modifying global state. e.g could I share one relax instance between several threads? The reason I ask is that in the case that they are well behaved many of the data transfer operations in a threaded environment with a single memory space would become noops ;-) nice!


I wouldn't share the state.  Again if you work at the 'minimise()'
model-free method level, copying it and renaming it to
'minimise_mpi()', that new function could be made to not touch the
relax data storage object.  Maybe there should be a
'minimise_mpi_master()' that contains the setup code and a
'minimise_mpi_slave()' which contains the optimisation code and the
unpacking code.  This should be very simple to copy and modify from
the current code!

Cheers,

Edward

Re: r3237 - in /branches/multi_processor: multi/mpi4py_processor.py relax

Header

Content

Related Messages