Re: relax, MPI, and Grid computing. -- March 20, 2007

Edward d'Auvergne wrote:

On Mon, 2007-03-19 at 22:28 +0000, Gary S. Thompson wrote:

Edward d'Auvergne wrote:

Hi,

Sorry I had to change the subject.  Unfortunately Gmail will cause the
thread to break!  This is a response to the post located at
https://mail.gna.org/public/relax-devel/2007-03/msg00090.html
(Message-id: <45FEB28A.7010600@xxxxxxxxxxxxxxx>).


On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:
Gary,
It might be important to note that the code that you commented out was actually grid computing code rather than threading code. Unfortunately I called my grid computing code 'threading'! Half of it could probably kept as is although relabelled to 'grid'.
Can you give an outline of how the grid code works I found it fairly convoluted when i tried to look at it....

Ok, I'll try to explain as best I can.  It has been quite a while
since I wrote this grid computing code so please excuse me if I get
something wrong.

Firstly relax is executed as you normally would in either the prompt
or script UI mode.  To set up grid computing you run the
'thread.read()' user function.  This reads a file which defines the
hosts by its host name or IP address, your user name on the machine,
the location of relax, the slave process priority number on the
machine, and the number of CPUs or CPU cores on the machine (to launch
multiple slaves processes on one machine).  More information about
this setup, SSH public key authentication (hence a password-less login
to the machine), etc. is given by the thread.read() documentation.

As the script or prompt statements are executed, relax will operate as
normal.  That is until the minimise() method of the
generic_fns.minimise.Minimise class instance is executed by the
minimise() user function.  Currently only Monte Carlo simulation
calculations are sent off for calculation to the elements of the
computer grid.  See the code for the full details.  The instantiation
of the RelaxMinParentThread class starts the process.  Essentially
what happens is that the parent thread starts n RelaxThread instances,
which are true threads, for the n Monte Carlo simulations.  Each
thread then does all the grid computing work asynchronously
communicating with the slave processes.  Unfortunately there is no
separation between the threading framework and the grid computing
framework at this point.

The grid computing algorithm I have come up with is the code of the
RelaxThread.run() method (see the thread_classes.py file).  I have
used two queues, the self.job_queue and the self.results_queue (see
the Python module Queue).  Both are queues of job numbers.  An
infinite loop is used for execution.  Firstly a job number is taken
from the self.job_queue.  The job number is then added back to the end
of the job queue - this is to make the threads and slaves fail safe
and so idle faster machines will pick up the jobs of the slower
machines while they are still running.  To prevent race conditions,
the element of the self.job_locks array corresponding to the job
number is locked.  A list of completed jobs self.finished_jobs is used
to determine if the job has been finished by a faster thread to
prevent the job number being added back to the job queue.  This allows
the job queue to be depopulate as jobs finish.  Once a job has been
completed its number is added to self.results_queue.  Termination of
the infinite loop occurs one the job number None is pulled out of the
queue.  To terminate all threads (and corresponding slave processes),
None is added back to the job queue.

I hope that wasn't too confusing,

Edward

Hi Ed No this wasn't too confusing. It helps quite alot and is relativley compatible with what I have (policy is relativley weak inside the processor objects and for good reason there are several possiblities for setting it up). The one thing that confuses me currently is how to bring up relax on a remote machine in a state where it is runnable without running a script into it...


I don't know how this is done in MPI but I would assume that you need to
get the slave to wait for commands.  I would also assume that this
waiting for commands is included as part of the mpi4py package.  In the
current grid computing code, there is no waiting.  The thread on the
parent machine sends the data, creates a temporary relax script, and
then launches 'relax --thread script'.  The relax process executes,
saves the result, and then terminates.  It is the threads that wait and
not the slave processes as these are respawned multiple times.  This is
probably not what you want for MPI.

I have played around with dummy runs in the latest iteration of the multi branch but am not sure if this is the way to go...

There has to be a way to get the slave process to wait.

getting the slave processes to wait is easy they just sit on MPI.recv blocking until a message is recieved. It was more a question of how to get a basic instance of relax setup that was the question. This was really about trying to use a more coarse grained mutip processing setup running at the the generic function level

I also had a look at save state in state.py and this seems quite heavy I presume that it dumps the complete program state to a pickle and then rejuvenates it at the other side?

This pickles solely the relax data storage object (which is now a singleton) as this is the program state. All permanent data and settings in relax must be stored in this object. This object in a pickled state is not what you'd need for MPI though - the object is too big for inter-node communication.

not necessarily it depends on how coarse grained your calculations are and how long they run for. What I am doing with the multi frameworks is effectivley just a wrapper round something similar to the therad code plus at the moment an implimentation and that uses complete state dumps (obviously if I can use more focused dumps things become faster)

You simply need the bare minimum sent
in both directions.  For model-free optimisation, see the 'minimise()'
method for the bare minimum objects required for optimisation (as well
as all the objects which are returned).
Consider line 101 of mpi4py_processor, the command is given a copy of the relax_instance and should now execute commands against it (whether to update state or to do something and then return an object via processor) How do i ensure that it it is in a usable state?

What exactly is 'relax_instance'? At the moment it appears to be set to None.

no it is setup to be a copy of the relax data structure used throughout relax i.e. each mpi4py_processor gets passed a relax instance currently (line 69) which is passed in at line 585 or relax

(try adding print_relax_instance as the first line of Mpi4py_processor you should get [<module 'multi' from './multi/__init__.pyc'>, <module 'multi.mpi4py_processor' from './multi/mpi4py_processor.pyc'>] <__main__.Relax instance at 0xb777f2ec> [<module 'multi' from './multi/__init__.pyc'>, <module 'multi.mpi4py_processor' from './multi/mpi4py_processor.py'>] <__main__.Relax instance at 0xb77ac2ec> [<module 'multi' from './multi/__init__.pyc'>, <module 'multi.mpi4py_processor' from './multi/mpi4py_processor.pyc'>] <__main__.Relax instance at 0xb77ee2ec> [<module 'multi' from './multi/__init__.pyc'>, <module 'multi.mpi4py_processor' from './multi/mpi4py_processor.pyc'>] <__main__.Relax instance at 0xb77c92ec> [<module 'multi' from './multi/__init__.pyc'>, <module 'multi.mpi4py_processor' from './multi/mpi4py_processor.pyc'>] <__main__.Relax instance at 0xb78542ec>)

I guess i could initialise the main interpreter and then save it's state but by that point it is running a script!
I wouldn't!  This will defeat much of the efficiency gains, especially
if individual model-free optimisations are executed on each slave.
Again, all that is needed is the objects of the model-free minimise()
method.  For example the data to send to the slave would be simply the
arguments to the Mf() instantiation:
           self.mf = Mf(init_params=self.param_vector,
param_set=self.param_set, diff_type=diff_type, diff_params=diff_params,
scaling_matrix=self.scaling_matrix, num_res=num_res,
equations=equations, param_types=param_types, param_values=param_values,
relax_data=relax_data, errors=relax_error, bond_length=r, csa=csa,
num_frq=num_frq, frq=frq, num_ri=num_ri, remap_table=remap_table,
noe_r1_table=noe_r1_table, ri_labels=ri_labels, gx=relax_data_store.gx,
gh=relax_data_store.gh, g_ratio=relax_data_store.g_ratio,
h_bar=relax_data_store.h_bar, mu0=relax_data_store.mu0,
num_params=num_params, vectors=xh_unit_vectors)
The returned data is shown on the lines:
results = generic_minimise(func=self.mf.func, dfunc=self.mf.dfunc, d2func=self.mf.d2func, args=(), x0=self.param_vector, min_algor=min_algor, min_options=min_options, func_tol=func_tol, grad_tol=grad_tol, maxiter=max_iterations, full_output=1, print_flag=print_flag) self.param_vector, self.func, iter, fc, gc, hc, self.warning = results


I will look at this

Maybe there should be a model-free specific method called 'minimise_mpi()' which is copied from the current 'minimise()' method. I think this is the area of relax which should be targeted.

yes but not mpi just  a multiprocessor specific version

One thing to note here is that I will at some stage try and rewrite commands to keep the slave states in sync as we run so we don't have to save the whole state. But that is for a later day, or never if you consider that to not be the way to go...


If you are working at the model-free 'minimise()' method level I think
you will get the best efficiency out of a cluster!  The granulatity
would be perfect - not too fine that inter-node communication is the
limiting factor and not too coarse that the used nodes of the cluster
will be underutilised.

At this low level the program state and the relax data storage object do
not even come into play.  Hence the slave program state will be
untouched and remain at the initial state for as long as it exists.
This will probably be the simplest solution to implement as well.  This
is what I eventually plan to do for the grid computing but you are
welcome to beat me to it.

ok i wll go for it

more questions
where should I be attacking the division problem?


Unless you see a point of using threads in the MPI code, don't attack
the division problem.  That one is my problem!

sorry I don't follow I need to divide upr the jobs otherwise I have nothing to work with??

my main thought was to effectivley add restrictions to a some commands. So consider grid search I would add an extra parameter at the generic and functional levels which would give a range of steps within the current parameters to calculate.... e.g here are the ranges which give a grid of 10x10x10 ie 1000 steps. slave 1. you calculate 1-250 slave 2. 251-500 and so on..... is this the correct way to go?


Subdividing the grid search will be an interesting problem!  Should it
be at the 'generic_fns' level, the 'specific_fns' level, or implemented
directly into the minimisation package?  I think that the 'specific_fns'
level, again within the 'minimise()' model-free method (copied, modified
for MPI, and renamed to 'minimise_mpi()') would be the best place to
target.

An algorithm to subdivide the grid would be useful.  Then an algorithm
to collect the results and determine which subspace of the grid has the
point with the lowest chi2 value should be used.  I.e. this will be an
MPI-oriented grid search over a number of standard grid searches.

However your best MPI gains are likely to be achieved by sending each
grid search to a different node.  This higher level would be shared with
the standard model-free optimisation code and hence you don't need to
worry about writing separate MPI code for the grid search and for the
minimisation.  Slight improvements may be achieved by breaking up the
grid search, but I would personally tackle this later on.

again I need to think about this. However if this uses divisons by model again it will perform poorly as the different models will take different amounts of time to calculate so many processors will sit idle... again if I am not undertsanding properly please accept my apologies relax is very heavily layered and alot of names are repeated multiple times it can be quite had to follow whatis going on in the code base ;-)

regards
gary

Cheers,

Edward


_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
relax-devel@xxxxxxx

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

--
-------------------------------------------------------------------
Dr Gary Thompson
Astbury Centre for Structural Molecular Biology,
University of Leeds, Astbury Building,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt@xxxxxxxxxxxxxxx                   Fax  +44-113-2331407
-------------------------------------------------------------------

Re: relax, MPI, and Grid computing.

Header

Content

Related Messages