mailRe: The multi-processor package and IO redirection.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Gary Thompson on September 19, 2011 - 17:58:
On 09/19/2011 02:22 PM, Edward d'Auvergne wrote:
The second problem is IO redirection.  This occurs in a number of
places in relax.  These include:

1)  The --log command line flag which causes STDOUT and STDERR to be
sent to file.

2)  The --tee command line flag which causes STDOUT and STDERR to be
sent both to file and to the terminal.to

3)  The test suite.  The STDOUT and STDERR streams are caught and only
sent to STDERR if there is an error or failure in a test.

4)  The relax controller (part of the GUI).  This is a window to which
STDOUT and STDERR are directed.  In the test-suite mode, the streams
also output to the terminal.

5)  The multi-processor package.  There are two parts.  The first is
essentially a pre-filter which prepends certain tokens to the IO
stream (i.e. the 'M  |', 'M  E|', and 'S 1|' text).  I cannot see how
we can do this as 4) is always set up after 5).  So I am considering
removing this part.  It will make it more difficult with debugging,
but I can see no other way.

6)  The second part for the multi-processor package, which is
currently non-functional, is the catching of the IO streams of the
slave processes to send back to the master.  I will try to mimic the
relax controller code here and store all slave text as a list with
flags specifying whether it is STDOUT or STDERR.  Then the list can be
returned to the master at which point the text can be sent to the two
streams.

The problem is that at each point here, sys.stdout and sys.stderr are
replaced and the order in which this happens is impossible to change.
Well 4) will always be last.
I think the problem here is that the whole idea of replacing the std io
streams multiple times is inflexible and painful.
This is true.  Python, like many programing languages, does not handle
IO streams very nicely.  It would be good if you could set up some
processing pipe for the IO stream, but this looks very complicated to
implement.


So I have come up with a
strawman multiplexed io implimentation. The idea is that you replace stdio
and stderr once and then insert IO elements to deal with the needs to block
the output of io streams, record them and  create Tees etc  see what you
think? Should we open a couple of new mail threads to discuss these things?
Lets see how it goes.  But we may need a few new threads.  For your
code below, it looks like it could be a the start of a solution.  The
question is, how do we design this so that the multi-processor package
is not dependent on relax?  Should the Multiplex_IO object be part of
relax, or the multi package?
most probably an independent service, I can design this in a neet way (NB have you looked at dependency injection yet?? ;-)
to I think that the main program should
have full control over the IO streams, and that the multi-processor
package should use what is available to it.
I don't see why the multi processor package shouldn't use it to capture streams, its pretty tidy, we just don't want it to be part of the main multiprocessor package
One problem here is that
a program could change the IO streams in mid operation.  For example
if I was to implement a function in the GUI which activates logging.
The Python program, as is with relax, could have many, many places
where IO streams are manipulated.  So it would be more beneficial to
have an IO stream manipulation framework within the main program.  The
multi-processor only needs to prepend text and capture slave IO
streams.
indeed, this is the whole idea, io streams are no longer manipulated all over the place, just in one pleace eg you could have a facade with functions like set_logging_on etc and only have this on the master

One note is that the slaves should be very minimal as usual, and if the stream manipulation package is independant of relax in terms of the package name space both the multiprocessor slaves and the relax main program can be dependant on it with very little contamination...

Maybe an alternative would be to capture and store the IO streams only
on the slaves?  This could be a fabric-specific implementation.  For
example the uni-processor would not touch the streams.  It would not
look as nice, but this would solve all the issues.  The slave IO
streams are captured, the 'S 1 |', 'S 2 |', etc. text would be
prepended, the text sent back to the master via the memo objects, and
then the master can send it to what ever sys.stdout and sys.stderr are
currently set to by the calling Python program.
so the thought here is that actually things should be even simpler on the slaves; stdout and stderr should just be text fields attached to the result objects returned to the master this way we don't have to parse strings and it is much more object orientated, and of course we return results per command basis which is even cleaner
As I mentioned
before, the slave's STDOUT and STDERR order can be preserved by
storing it all in a list whereby the IO stream is identified by a
flag, as is done in the relax controller GUI window.
basically as you can see above i like the idea but i would use a field there is a limited (stderr or stdout) set of options
For example if
you have in a slave:

sys.stdout.write('test\n')
sys.stderr.write('fail\n')

Then the object passed to the master via the memo could look like:

io =[
   ['S 1 | test\n', 0],
   ['S 2 | fail\n', 1]
]

Then on the master:

for line, stream in io:
     if stream == 0:
         sys.stdout.write(line)
     else:
         sys.stderr.write(line)

This means that the two slave streams are merged, and then split again
by the master, preserving their order.  What do you think the overhead
of such an operation would be?  My guess would be only a few
milliseconds per slave process, as most of the work would be on the
master.

very low...
I would guess that the overhead should be similar to the
current prependIO code.  Only touching IO streams from the slaves
created by the multi-processor framework would not clash with any IO
manipulations the main program could ever imagine to do.
indeed

Regards,

Edward

regards
gary

--
-------------------------------------------------------------------
Dr Gary Thompson                  [Homans Lab Research Coordinator]

Astbury Centre for Structural Molecular Biology,
University of Leeds,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt@xxxxxxxxxxxxxxx                   Fax  +44-113-3431935
-------------------------------------------------------------------





Related Messages


Powered by MHonArc, Updated Tue Sep 20 12:00:16 2011