Re: Pre-sending data in the multi-processor package. -- March 22, 2012

Hi Ed

sorry that I didn't get to this earlier, things have been a bit hectic,Arnouts had a baby, we had a complete power cut for 1 day this week andI have had to do helium fills as well. Anyway some thoughts.


Setting data on the remote machine as a cache is a good idea.

setting up a remote set of constants is easy once the multi processor isconfigured as all you need to do is queue a multi.Slave_command thatwill save some state on the remote machine either in a class or modulevariable or a global.

So my thought is there is no need add any specific storage api to thepackage, the easiest thing to do would be to just add a Slave-commandthat you can queue which sets a class or global variable on the targetmachine. This means that the all the intelligence is in the add on classrather than in the main multi processor package. I see several goodthings in this


1. less api
2. less code to maintain
3. more flexibility and more modular

4. modules can that use the multi processor api are more isolated asthey can save data in their own namespace rather than having problemswith having problems with names clashing in a dict based storage area

5. its a better use of what python gives us


I hope this helps I am now working my way back through the backlog

regards
gary


On 03/21/2012 09:50 AM, Edward d'Auvergne wrote:

Hi Gary,

I think I'll start to modify the design of the multi-processor
package.  What is required is a data storage container within each
Processor instance (on each node).  As the Processor is a singleton
and there is only one per node, then this container would be unique.
There would need to be a function within the multi-processor API that
calling code on the master can use to send data to all slaves to be
stored in this data container.  As the parallelisation code is at the
level of the function call, then almost all data used by the slaves is
identical - the only difference being a few parameters.  This could
also be used both at the level of the initialisation of the target
function class to send invariant data once at the start, and then at
the level of the target function call to send data that changes per
function call (i.e. with the model parameters).  The slave_command
objects will then be sent to the slaves, and the slaves can then
access the data within these command objects and the
Processor.data_container objects, again probably via an API function.
If you don't think this is a good idea, or if you can see that you
have implemented something similar that I have missed, please say.

For the API (multi/__init__.py), I am thinking of the following pair
of optional functions:

def data_fetch(name=None):
     """API function for obtaining data from the Processor instance's data 
store.

     This is for fetching data from the data store of the Processor instance.


     @keyword name:  The name of the data structure to fetch.
     @type name:     str
     @return:        The value of the associated data structure.
     @rtype:         anything
     """


def data_upload(name=None, value=None, rank=None):
     """API function for sending data to be stored on the Processor of
the given rank.

     This can be used for transferring data from Processor instance i
to the data store of Processor instance j.


     @keyword name:  The name of the data structure to store.
     @type name:     str
     @keyword value: The data structure.
     @type value:    anything
     @keyword rank:  An optional argument to send data only to the
Processor of the given rank.  If None, then the data will be sent to
all Processor instances.
     @type rank:     None or int
     """

The parallelised model-free code will be unaffected as the
parallelisation is at a much higher level and does not need this
mechanism.  Any feedback would be appreciated.

Cheers,

Edward




On 14 March 2012 16:17, Edward d'Auvergne<edward@xxxxxxxxxxxxx>  wrote:

Hi Gary,

Before I start hacking into the multi-processor package, I was
wondering if you know of a way of pre-sending data to slave processors
using the current design?  The reason is because I would like to have
the parallelisation at the lowest level of the target function.  But
there is a massive quantity of data which doesn't change at the target
function level which would be better to transmit to and store on the
slaves prior to optimisation (atomic positions, bond vectors, base NMR
data, missing data flags, etc.).  This is required to keep the data
transmission of the slave_command objects from killing scalability.
Any ideas?

Cheers,

Edward



--
-------------------------------------------------------------------
Dr Gary Thompson                  [Homans Lab Research Coordinator]

Astbury Centre for Structural Molecular Biology,
University of Leeds,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt@xxxxxxxxxxxxxxx                   Fax  +44-113-3431935
-------------------------------------------------------------------

Re: Pre-sending data in the multi-processor package.

Header

Content

Related Messages