relax, MPI, and Grid computing. -- March 20, 2007

On Mon, 2007-03-19 at 22:28 +0000, Gary S. Thompson wrote:

Edward d'Auvergne wrote:

Hi,

Sorry I had to change the subject.  Unfortunately Gmail will cause the
thread to break!  This is a response to the post located at
https://mail.gna.org/public/relax-devel/2007-03/msg00090.html
(Message-id: <45FEB28A.7010600@xxxxxxxxxxxxxxx>).


On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:

Gary,

It might be important to note that the code that you commented out was
actually grid computing code rather than threading code.
Unfortunately I called my grid computing code 'threading'!  Half of it
could probably kept as is although relabelled to 'grid'.


Can you give an outline of how the grid code works I found it fairly
convoluted when i tried to look at it....



Ok, I'll try to explain as best I can.  It has been quite a while
since I wrote this grid computing code so please excuse me if I get
something wrong.

Firstly relax is executed as you normally would in either the prompt
or script UI mode.  To set up grid computing you run the
'thread.read()' user function.  This reads a file which defines the
hosts by its host name or IP address, your user name on the machine,
the location of relax, the slave process priority number on the
machine, and the number of CPUs or CPU cores on the machine (to launch
multiple slaves processes on one machine).  More information about
this setup, SSH public key authentication (hence a password-less login
to the machine), etc. is given by the thread.read() documentation.

As the script or prompt statements are executed, relax will operate as
normal.  That is until the minimise() method of the
generic_fns.minimise.Minimise class instance is executed by the
minimise() user function.  Currently only Monte Carlo simulation
calculations are sent off for calculation to the elements of the
computer grid.  See the code for the full details.  The instantiation
of the RelaxMinParentThread class starts the process.  Essentially
what happens is that the parent thread starts n RelaxThread instances,
which are true threads, for the n Monte Carlo simulations.  Each
thread then does all the grid computing work asynchronously
communicating with the slave processes.  Unfortunately there is no
separation between the threading framework and the grid computing
framework at this point.

The grid computing algorithm I have come up with is the code of the
RelaxThread.run() method (see the thread_classes.py file).  I have
used two queues, the self.job_queue and the self.results_queue (see
the Python module Queue).  Both are queues of job numbers.  An
infinite loop is used for execution.  Firstly a job number is taken
from the self.job_queue.  The job number is then added back to the end
of the job queue - this is to make the threads and slaves fail safe
and so idle faster machines will pick up the jobs of the slower
machines while they are still running.  To prevent race conditions,
the element of the self.job_locks array corresponding to the job
number is locked.  A list of completed jobs self.finished_jobs is used
to determine if the job has been finished by a faster thread to
prevent the job number being added back to the job queue.  This allows
the job queue to be depopulate as jobs finish.  Once a job has been
completed its number is added to self.results_queue.  Termination of
the infinite loop occurs one the job number None is pulled out of the
queue.  To terminate all threads (and corresponding slave processes),
None is added back to the job queue.

I hope that wasn't too confusing,

Edward

.

Hi Ed
No this wasn't too confusing. It helps quite alot and is relativley 
compatible with what I have (policy is relativley weak inside the 
processor objects and for good reason there are several possiblities for 
setting it up).  The one thing that confuses me currently is how to 
bring up relax on a remote machine in a state where it is runnable 
without running a script into it...


I don't know how this is done in MPI but I would assume that you need to
get the slave to wait for commands.  I would also assume that this
waiting for commands is included as part of the mpi4py package.  In the
current grid computing code, there is no waiting.  The thread on the
parent machine sends the data, creates a temporary relax script, and
then launches 'relax --thread script'.  The relax process executes,
saves the result, and then terminates.  It is the threads that wait and
not the slave processes as these are respawned multiple times.  This is
probably not what you want for MPI.

 I have played around with dummy runs 
in the latest iteration of the multi branch but am not sure if this is 
the way to go...


There has to be a way to get the slave process to wait.

 I also had a look at save state in state.py and this 
seems quite heavy I presume that it dumps the complete program state to 
a pickle and then rejuvenates it at the other side?


This pickles solely the relax data storage object (which is now a
singleton) as this is the program state.  All permanent data and
settings in relax must be stored in this object.  This object in a
pickled state is not what you'd need for MPI though - the object is too
big for inter-node communication.  You simply need the bare minimum sent
in both directions.  For model-free optimisation, see the 'minimise()'
method for the bare minimum objects required for optimisation (as well
as all the objects which are returned).

 Consider line 101 of 
mpi4py_processor, the command is given a copy of the relax_instance and 
should now execute commands against it (whether to update state or to do 
something and then return an object via processor) How do i ensure that 
it it is in a usable state?


What exactly is 'relax_instance'?  At the moment it appears to be set to
None.

 I guess i could initialise the main 
interpreter and then save it's state but by that point it is running a 
script!



I wouldn't!  This will defeat much of the efficiency gains, especially
if individual model-free optimisations are executed on each slave.
Again, all that is needed is the objects of the model-free minimise()
method.  For example the data to send to the slave would be simply the
arguments to the Mf() instantiation:

            self.mf = Mf(init_params=self.param_vector,
param_set=self.param_set, diff_type=diff_type, diff_params=diff_params,
scaling_matrix=self.scaling_matrix, num_res=num_res,
equations=equations, param_types=param_types, param_values=param_values,
relax_data=relax_data, errors=relax_error, bond_length=r, csa=csa,
num_frq=num_frq, frq=frq, num_ri=num_ri, remap_table=remap_table,
noe_r1_table=noe_r1_table, ri_labels=ri_labels, gx=relax_data_store.gx,
gh=relax_data_store.gh, g_ratio=relax_data_store.g_ratio,
h_bar=relax_data_store.h_bar, mu0=relax_data_store.mu0,
num_params=num_params, vectors=xh_unit_vectors)

The returned data is shown on the lines:

                results = generic_minimise(func=self.mf.func,
dfunc=self.mf.dfunc, d2func=self.mf.d2func, args=(),
x0=self.param_vector, min_algor=min_algor, min_options=min_options,
func_tol=func_tol, grad_tol=grad_tol, maxiter=max_iterations,
full_output=1, print_flag=print_flag)
            self.param_vector, self.func, iter, fc, gc, hc, self.warning
= results

Maybe there should be a model-free specific method called
'minimise_mpi()' which is copied from the current 'minimise()' method.
I think this is the area of relax which should be targeted.

One thing to note  here is that I will at some stage try and rewrite 
commands to keep the slave states in sync as we run so we don't have to 
save the whole state. But that is for a later day, or never if you 
consider that to not be the way to go...


If you are working at the model-free 'minimise()' method level I think
you will get the best efficiency out of a cluster!  The granulatity
would be perfect - not too fine that inter-node communication is the
limiting factor and not too coarse that the used nodes of the cluster
will be underutilised.

At this low level the program state and the relax data storage object do
not even come into play.  Hence the slave program state will be
untouched and remain at the initial state for as long as it exists.
This will probably be the simplest solution to implement as well.  This
is what I eventually plan to do for the grid computing but you are
welcome to beat me to it.

more questions

where should I be attacking the division problem?


Unless you see a point of using threads in the MPI code, don't attack
the division problem.  That one is my problem!

 my main thought was to 
effectivley add restrictions to a some commands. So consider grid search 
I would add an extra parameter at the generic  and functional levels 
which would give a range of steps within the current parameters to 
calculate.... e.g here are the ranges which give a grid of 10x10x10 ie 
1000 steps. slave 1. you calculate 1-250 slave 2. 251-500 and so on..... 
is this the correct way to go?


Subdividing the grid search will be an interesting problem!  Should it
be at the 'generic_fns' level, the 'specific_fns' level, or implemented
directly into the minimisation package?  I think that the 'specific_fns'
level, again within the 'minimise()' model-free method (copied, modified
for MPI, and renamed to 'minimise_mpi()') would be the best place to
target.

An algorithm to subdivide the grid would be useful.  Then an algorithm
to collect the results and determine which subspace of the grid has the
point with the lowest chi2 value should be used.  I.e. this will be an
MPI-oriented grid search over a number of standard grid searches.

However your best MPI gains are likely to be achieved by sending each
grid search to a different node.  This higher level would be shared with
the standard model-free optimisation code and hence you don't need to
worry about writing separate MPI code for the grid search and for the
minimisation.  Slight improvements may be achieved by breaking up the
grid search, but I would personally tackle this later on.

Cheers,

Edward

relax, MPI, and Grid computing.

Header

Content

Related Messages