Re: The multi-processor branch. -- May 29, 2007

On 5/18/07, Gary S. Thompson <garyt@xxxxxxxxxxxxxxx> wrote:

Edward d'Auvergne wrote:

>On Fri, 2007-05-04 at 13:59 +0100, Gary S. Thompson wrote:


[snip]

>If using '-np 6', shouldn't the number of slaves be 6?
>
nope there needs to be one processor which is the master and you just
tell  mpi how many prcoessors you want ( I will investigate running jobs
on the master in a thread at some point (maybe never depending ;-)) but
this places extra requirements on the mpi implimentation and is thus a
special case, I can give more details if you want me to)


I would personally avoid running one of the calculations on the master.

>>when running under the threaded and mpi4py implimentations you may see
>>long gaps with no output and the output to the terminal can be quite
>>'jerky'. This is because the multiprcoessor implimentation uses a
>>threaded output queue to decouple the writing of output on the master
>>from the queuing of calculations on the slaves, as otherwise for
>>systems with slow io the rate of io on the mastewr can control the
>>rate of calculation!
>>
>>
>
>I'll have to test this later and see if I can cosmetically minimise the
>jerkyness.
>
>
>
Your can't! Well ok you can but there are 'implications'. The jerkyness
is intrinsic to the  batching up of results from Slave_commands, so you
can switch off the batching of results from the Slave_processors but
this will put more stress on ther master and the interprocessor
communication fabric. If you want to return string results one line at a
time the design also allows you to do this, but again you stress the
master processor and the interprocessor communication fabric so possibly
slowing the overall calculation. Note also that what works well for a
computer with fast interprcoess interconnects will not work well on a
computer with a slow communication fabric. Anyway the message overall is
if you block/slow the master you can end up slowing the whole
multiprocessor....


I was thinking of storing stdout and stderr in a buffer on the master
and having a thread handle the screen output.

>>also note the std error stream is not currently used as race
>>conditions between writing to the  stderr and stdout streams can lead
>>to garbled output.
>>
>>
>
>This will definitely need to be fixed prior to merging into the 1.3
>line.  Stdout and stderr separation is quite important.
>
>
>
>
Indeed this is true and what I intend to do is to reintroduce an output
streem that splits output on the master based on what the lines prefix
is. This is all down to efficiency again, i vcould retuirn each line of
text as it is output to the output stream on the slave, however, so
there are not lots of objects to send between  processors I join the
streams together with the tags for identification of where the lien came
from.. The intention was to give the user the choice to split them again
at the other  end but, I still haven't had time to write that code.


It's a pity MPI doesn't have the ability to keep the streams separate.
Maybe it would be better if the slave keep its output in a buffer and
then sent it all at once at the end.

>>Now some caveats
>>1. not all exceptions can be handled by this mechanism as they
>>exceptions can only be handed back once communication between the
>>slaves has been setup. This can be a problem on some mpi
>>implimentations as they don't provide redirection of stdout back to
>>the master contolling trerminal.
>>
>>
>
>There's probably not much that can be done there.
>
>

yes what I am looking at is putting output to a file one per processor
in this case

 (this won't work in all cases as some clusters don't have disk storage?)


How about storing stdout and stderr in buffers in memory?

>>2. I have had a few cases where raising an exception has wedged the
>>whole multiproessor without any output. These can be quite hard to
>>debug as they are due to errors in the overrides I put on the io
>>streams! a pointer that may help is that  using the
>>sys.settrace(traceit)  as shown in processor.py will produce copious
>>output tracing  (and a very slow program)
>>
>>
>
>The sorting out and separation of the IO streams may cause this problem
>to disappear.
>
>
nope this may be  to do with exceptions being thrown on remote
proceessors and the master processor waiing infinitley long for
communication from dea processors...


Ah, this will be a significant issue in a grid computing setup where a
person in the building may turn their computer, which is part of the
grid, off!  Fault tolerance is what made my old threading code such a
nightmare.

>>in future it maybe possible also parallelise the minimisation of
>>modelfree calculations of the 'all' case where model fitting and the
>>tensor frame are optimised at the same time. However,this will require
>>modifications to the model free hessian gradient and cuntion
>>calculation routines and development of a parallel newton line seach
>>which are both major undertakings.
>>
>>
>
>These are possible targets for parallelisation but I would very strongly
>recommend against working at this position.  And adding optimisation
>algorithms would require very careful testing.  From my experience with
>optimisation in the model-free space, I would probably bet that the
>algorithm will fail for certain model-free motions (not many algorithms
>find all minima in such a convoluted space).  The place to target is the
>following three functions:
>       maths_fns.mf.Mf.func_all()
>       maths_fns.mf.Mf.dfunc_all()
>       maths_fns.mf.Mf.d2func_all()
>
>Specifically the loop over all residues (to be renamed to all spin
>systems in the 1.3 line) to create the value, gradient, and Hessian
>would be the ideal spot to parallelise!
>
>
>
indeed this is what I thought and neil is working on it. One side note is 
that (if such a thing existed) a line search which is  adventitous and looked 
a superset of newton positions would work ;-) again it would have to be 
tested but all such things have to betested to some degree


See Chapter 4 of my PhD thesis at
http://eprints.infodiv.unimelb.edu.au/archive/00002799/ to see why
almost all local optimisation algorithms fail in the model-free space.
Out of 31 algorithms tested, only two worked sufficiently for
model-free analysis - Newton optimisation with the Backtraking step
length algorithm and GMW81 Hessian modification, and simplex
optimisation (which was slower).  Both required constraints using the
Method of Multipliers algorithm, also known as the Augmented
Lagrangian.

note are tests for the cases for where lm failed in the test suite for
relax?


lm failed?  The model-free optimisation system tests are points where
some algorithms struggle.

>> Indeed the problem may be fine grained enough that use of c mpi and
>>recoding of the hessian etc calculations for model free in c is
>>required
>>
>>
>
>This conversion should significantly speed up calculations anyway.  I
>will do this one day.
>
>
>
>
the later we do this the better c is a bind. I still think pyrex which
compiles what almost lloks like python to c woould be a good thing to
look at ;-) I might try an prototype something for you to look at at
some point


I don't like pyrex at all.  I have already tried to use it on the
model-free function code - the output sucked!  Most of the code are
simple functions of less that 5 lines.  The hard part would be to port
'maths_fns/mf.py', which shouldn't be too hard anyway, and the rest
would be very basic.

Cheers,

Edward

Re: The multi-processor branch.

Header

Content

Related Messages