Re: [Fwd: Re: multi processing] -- July 06, 2006

Sorry about the late response, you probably know what writing up is
like!  I've created the 1.3 development branch as the MPI changes are
likely to be very disruptive.  Gary, feel free to create your private
branch by copying the code from the 1.3 line.  You can use a command
such as:

$ svn cp svn+ssh://varioustoxins@xxxxxxxxxxx/svn/relax/1.3
svn+ssh://varioustoxins@xxxxxxxxxxx/svn/relax/branches/mpi

to make the branch.  Don't forget to subscribe to the relax-commits
mailing list so that you get emails of each change that is made.  With
the email you will receive a diff of the changes.  By subscribing you
will also receive emails of changes others may make to your branch.
That might occur if you ask for help or if other non-MPI parts of the
code are modified to better sit side-by-side with the MPI code.  For
collaborative development it is very important to be on that list.  I
have a few more points below.

>>Thats generally the idea I had, i.e. a fairly course grained approach.
>>My thought was to add constructs to the top level commands (if needed)
>>to allow allow subsets of a set of calculations to be run from a script.
>>i.e. part of a grid search or a few monte carlo runs or a subset of
>>minimisations for a set of residues. Then the real script would generate
>>the required subscripts plus embedded data on the fly. I think this
>>provides a considerable degree of flexibility. Thus for instance our
>>cluster which runs grid engine needs a master script to start all the
>>sub processes rather than a set of separate password less ssh logons
>>which a cluster of workstations would require. In general I thought that
>>catching failures other than a failure to start is not required...
>>
>>
>
>Is your idea similar to having the runs themselves threaded so instead
>of looping over them you run them simultaneously?  I don't know too
>much about clustering.  What is the interface by which data and
>instructions are sent and returned from the nodes?  And do you know if
>there are python wrappings?
>
>

so the idea is to take the low hanging fruit for the moment and only
parallelise the things that will naturally run for the same amounts of time

e.g. divide up sets of monte carlo simulations into parts, run
minimisations on subsets of residues that share the same model and
tensor frame etc


That would be the easiest way to start things off.  The MPI overhead
of threading the grid search will probably be too much anyway.
Individual grid searches can be parallelised, however it would be best
not to split up an individual grid search yet.

as to how to send data, scripts and results: I would write an interface
class and then allow differnt instances of the class to deal with
communication differently to support different transport mehtods e.g.
ssh logins vs mpi sessions (or something which hasn't been invented yet)


We could set it up so that there is a clear separation of the
threading code, the MPI code, the SSH code, etc.  That way both the
MPI and SSH code can use the same threads, just in a different way.
That may allow an efficient dual, yet separate, implementation of
clustering and grid computing.

transfer of data will use cpickles in my case with an mpi backend to
keep compute nodes available to prevent queing problems (you don't want
to resubmit to the batch queue each time you calculate a subpart of the
problem....)


Sound's like a good way to transfer the data to the nodes.  It would
be good to only send the minimal amount of data required.  For example
the data which is rounded up in the function 'self.minimise()' in
'specific_fns/model_free.py' and sent to the model-free minimisation
code in 'maths_fns/'.

>>>SSH tunnels is probably not the best option for your system.  Do you
>>>know anything about MPI?
>>>
>>>
>>>
>>I have read about MPI but have not implimented anything __YET__;-). Also
>>I have compiled some MPI based programs. It seems to a bit of a pig and
>>I don't think the low hanging fruit necessarily require that degree of
>>fine grained distribution...
>>
>>
>
>I haven't used MPI either.  There may be much better protocols
>implemented for Python.
>
>

actually after looking at the problem in our local implementation we
will need mpi and I have the mpi from  from scientific working on my
computer.   However,  as alluded to above mpi will only be a  dependancy
for  a particular transport methods not the overall scheme

>
>
>>>There are a number of options available for
>>>distributed calculations, but it will need to have a clean and stable
>>>Python interface.
>>>
>>>
>>>
>>obviously a stable interface with as little change to the current top
>>level functions and as little suprise as possible is to be desired. I
>>thought it might be a good idea  to have some form of facade, so  that
>>the various forms of coarse grained multi processing looks the same,
>>whichever one you are using. The idea would be only to have the setup
>>and dispach code different.
>>
>>
>>
>
>It would probably be best to use some existing standard protocol
>rather than inventing a relax specific system.
>
>

I think the interface of scripts plus data provides all you need,  the
actual methodology  in the   transport method can be private...

so for example:

1. create a clustre with a transport layer

top level script:

init_parallel()
                                                   # override relax
commands as needed


It's not overriding, your just selecting a different UI for the MPI
relax instances.  See the new figure 9.1 in the manual as to where the
MPI UI would fit in.

cluster= create_cluster(name='test')
                                     # the cluster to use you can have
more than one...


Would that feature be of any use?

mpi-transport=create_transport(name='name',method='mpi-local',....)
              # a transport layer all extra keyword arguments are for
configurateion
processor-set=create_processor(transport=mpi-transport,nprocessors=30,...)
     # a particular set of processors using a partuicular transport
method, with a particular weight
cluster_add_processor(processor-set, weight=1.0)
                        # add it to the pool of available processors


All the above could probably be merged into a single simple user
function.  All the hard work can be done behind the scenes.

normal relax setup ...

minimise('newton',run=name,cluster=cluster)
                           # one extra argument


The underlying 'generic' code should be able to determine the selected
UI and make the decisions itself.  The user need not worry about these
things, the less the user has to do the better.


2. internally

class transport(object):                                            #
just knows howto setup a connection to a bunch ot prosessors and
communicate with them

  def __init__(self):
    pass

  def start(self,nprocessors,**kw):                           # setup
for calculation, returns processor-set for this particuar connection

pass
# kw arguments from create_processor

 def shutdown(self,aprocessor-set):
         # end all calculations and shutdown
    pass

  def setupData(self,processor-set,data,nodes=None):                #
send setup data, in my case I would pickle it to an in memory file and
then put it in a


Have a look at section 9.2.3 of the manual in the current sources for
relax object naming conventions.


# numpy byte array for transport over numerics mpi layer,  if node is
None send it to everyone
     pass

  def calculate(self,processor-set,node,script,callback, tag):
                               # run the script on the node x and call
completion callback with tag when complete
    pass

  def getData(self,processor-set,node=None):
    pass

  def status(self,processor-set,node=None):
                        # test for status of  a particular calculation
    pass

  def
cancel(self,processor-set,node=None):
# give up calculation on a particular node
     pass


class cluster(object):

   def __init__(self):
     pass

  def start(self):
      pass

  def getDivisions(self,nproblems):       # get a list of of size for
'divisions' of  the problems to send to each element of each processor
set based on weights and  number of processors
     pass

  def shutdown(self):
      pass


  def setupData(self,data):                             # send setup data
      pass


  def calculate(self,division,scripts):                           # run
the script on all nodes
    pass


  def getData(self,division)                                          #
get results
     pass

.... anyway i think the idea is fairly clear


It sounds good.  We should determine clean interfaces between the MPI
code, threading code, etc.  Don't be scared about throwing out all the
threading code I've already implemented, it's a bit buggy and starting
from scratch might be best.  Andrew, what do you think about the
threading/MPI+SSH interface idea?

>>>Which ever system is decided upon, threading inside
>>>the program will probably be necessary so that each thread can be sent
>>>to a different machine.  This requires calculations which can be
>>>parallelised.  As minimisation is an iterative process with each
>>>iteration requiring the results of the previous, and as it's not the
>>>most CPU intensive part anyway, I can't see too many gains in
>>>modifying that code.
>>>
>>>
>>>
>>Agreed
>>
>>
>>
>>>I've already parallelised the Monte Carlo
>>>simulations for the threading code as those calculations are the most
>>>obvious target.
>>>
>>>
>>>
>>They are a time hog
>>
>>
>
>Grid searching model m8 {S2, tf, S2f, ts, Rex} probably beats the
>total of the MC sims (unless the data is dodgy).
>
>
>
>>>But all residue specific calculations could be
>>>parellelised as well.  This is probably where you can get the best
>>>speed ups.
>>>
>>>
>>>
>>Yes that and grid searches seem obvious candidates
>>
>>
>>
>
>I was thinking more along the lines of splitting the residues rather
>than the grid search increments.  These increments could be threaded
>however the approach would need to be conservative.  I'm planning on
>eventually splitting out the minimisation code as a separate project
>on Gna! as a Python optimisation library.  The optimisers in Scipy are
>useless!
>
>

I think whichever divisons are equal and fit the best are what is
required, though residues would be the obvious first candidate followed
by grid steps


I personally think that solely the calls to the code in 'maths_fns'
should be where the split occurs.  For model-free analysis this is
within 'specific_fns/model_free.py' in the 'self.minimise()' function.
  The setup on line 2318 (self.mf = Mf(...)) and then the call to the
code on lines 2360 or 2362 (results = generic_minimise(...)) is the
natural break which should be targeted.

We should try to write smaller messages, this thread is getting a bit
to fat.  Good luck with the coding Gary.

Edward

Re: [Fwd: Re: multi processing]

Header

Content

Related Messages