relax and Grid computing. -- March 19, 2007

Hi,

Sorry I had to change the subject.  Unfortunately Gmail will cause the
thread to break!  This is a response to the post located at
https://mail.gna.org/public/relax-devel/2007-03/msg00090.html
(Message-id: <45FEB28A.7010600@xxxxxxxxxxxxxxx>).

On 3/20/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:

> Gary,
>
> It might be important to note that the code that you commented out was
> actually grid computing code rather than threading code.
> Unfortunately I called my grid computing code 'threading'!  Half of it
> could probably kept as is although relabelled to 'grid'.

Can you give an outline of how the grid code works I found it fairly
convoluted when i tried to look at it....


Ok, I'll try to explain as best I can.  It has been quite a while
since I wrote this grid computing code so please excuse me if I get
something wrong.

Firstly relax is executed as you normally would in either the prompt
or script UI mode.  To set up grid computing you run the
'thread.read()' user function.  This reads a file which defines the
hosts by its host name or IP address, your user name on the machine,
the location of relax, the slave process priority number on the
machine, and the number of CPUs or CPU cores on the machine (to launch
multiple slaves processes on one machine).  More information about
this setup, SSH public key authentication (hence a password-less login
to the machine), etc. is given by the thread.read() documentation.

As the script or prompt statements are executed, relax will operate as
normal.  That is until the minimise() method of the
generic_fns.minimise.Minimise class instance is executed by the
minimise() user function.  Currently only Monte Carlo simulation
calculations are sent off for calculation to the elements of the
computer grid.  See the code for the full details.  The instantiation
of the RelaxMinParentThread class starts the process.  Essentially
what happens is that the parent thread starts n RelaxThread instances,
which are true threads, for the n Monte Carlo simulations.  Each
thread then does all the grid computing work asynchronously
communicating with the slave processes.  Unfortunately there is no
separation between the threading framework and the grid computing
framework at this point.

The grid computing algorithm I have come up with is the code of the
RelaxThread.run() method (see the thread_classes.py file).  I have
used two queues, the self.job_queue and the self.results_queue (see
the Python module Queue).  Both are queues of job numbers.  An
infinite loop is used for execution.  Firstly a job number is taken
from the self.job_queue.  The job number is then added back to the end
of the job queue - this is to make the threads and slaves fail safe
and so idle faster machines will pick up the jobs of the slower
machines while they are still running.  To prevent race conditions,
the element of the self.job_locks array corresponding to the job
number is locked.  A list of completed jobs self.finished_jobs is used
to determine if the job has been finished by a faster thread to
prevent the job number being added back to the job queue.  This allows
the job queue to be depopulate as jobs finish.  Once a job has been
completed its number is added to self.results_queue.  Termination of
the infinite loop occurs one the job number None is pulled out of the
queue.  To terminate all threads (and corresponding slave processes),
None is added back to the job queue.

I hope that wasn't too confusing,

Edward

relax and Grid computing.

Header

Content

Related Messages