Re: r3236 - /branches/multi_processor/relax -- March 20, 2007

On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:

>   For example
> the 'threading' user function class which sets up grid computing.  In
> the future I'll probably want to use my algorithm for handling very
> slow machines on a grid (which avoids relax having to wait for a slow
> machine to terminate) and the setting up of slave relax processes
> (using the 'relax --thread' invocation).  Although the grid computing
> code is currently broken, this is only because there is a problem with
> the handling of SSH tunnel breakages.  I also have in mind some
> optimisations for minimising data flow through the tunnel

can you outline this?


Well, currently data is sent to the slave grid processes via ssh
tunnels.  Prior to running the grid processes relax saves the
model-free results file and sends it to all machines.  The slave
processes read in the results file, run a single Monte Carlo
simulation, save another model-free results file, and transfer this
file back.  The parent thread then reads the new results file into a
temporary 'run' and copies the data to the real run.

This setup is slow, inefficient, and just plain sucks!  What I would
like to do is to set the slave processes to run in a state where they
wait for instructions.  The bare minimum data is then sent to them and
solely the optimised parameter vector and optimisation stats are
returned (i.e. the minimise() method of specific_fns.model_free
module).  This may be how you would like to set up MPI where MPI is
used as the communication interface?

> and Andrew
> Perry has had ideas about using heartbeats from the grid machine relax
> processes to probe for dead tunnels and processes.


again this would be interesting but I would have to look at it quite
carefully as mpi has some threading limitations I believe...


I don't think these issues are related to MPI.

my thought on this was to turn  the treading code into an implimentation
of the multi code so my plan was to rip it out and then put it back ;-)
The architecture devised should be able to cope with ssh tunnels just as
well as a thread or a mpi invocation I think... If not I would be happy
to adapt to any ideas you have.

From my understanding of MPI, threads need not be used. So the

threading and grid computing code could possibly be left untouched by
the MPI patch.  If you would like to make use of threads in the MPI
code and if there is overlap between MPI threads and grid computing
threads, we can worry about issues then.  Oh, what exactly is 'multi
code'?

The basic idea is as you can see is to send pickled commands containing
data to the slave which then returns a pickled result object,  and I
think that this quite possible down an ssh tunnel. If not text commands
with embedded text would also work.


These pickled objects would be much more efficient than the way grid
computing is currently set up.  I would actually like to have pickled
objects sent to and from the grid slave processes, but you don't need
to worry about that.  The grid computing setup could be improved after
the MPI patch makes it into the main line by using the same pickling
interface.

Cheers,

Edward

Re: r3236 - /branches/multi_processor/relax

Header

Content

Related Messages