On 27 May 2015 at 03:09, <tlinnet@xxxxxxxxxxxxx> wrote:
Author: tlinnet Date: Wed May 27 03:09:59 2015 New Revision: 27845 URL: http://svn.gna.org/viewcvs/relax?rev=27845&view=rev Log: Suggestion for fix 2, where jobs are continously replenished when other jobs are finished. Bug #23618: (https://gna.org/bugs/index.php?23618): queuing system for multi processors is not well designed. Modified: trunk/multi/processor.py Modified: trunk/multi/processor.py URL: http://svn.gna.org/viewcvs/relax/trunk/multi/processor.py?rev=27845&r1=27844&r2=27845&view=diff ============================================================================== --- trunk/multi/processor.py (original) +++ trunk/multi/processor.py Wed May 27 03:09:59 2015 @@ -585,6 +585,8 @@ running_set = set() idle_set = set([i for i in range(1, self.processor_size()+1)]) + all_jobs = list(reversed(xrange(1, len(queue)+1))) + completed_jobs = [] if self.threaded_result_processing: result_queue = Threaded_result_queue(self) @@ -606,8 +608,9 @@ while len(running_set) != 0: # Debugging printout. if verbosity.level(): - print('\nIdle set: %s' % idle_set) - print('Running set: %s' % running_set) + print('\n') + print('Running nr of jobs: %i' % len(running_set)) + print('Completed jobs: %s' % len(completed_jobs)) # Get the result. result = self.master_receive_result() @@ -616,6 +619,13 @@ if result.completed: idle_set.add(result.rank) running_set.remove(result.rank) + completed_jobs.append(all_jobs.pop()) + if len(queue) != 0: + # Add new to que + command = queue.pop() + dest = result.rank + self.master_queue_command(command=command, dest=dest) + running_set.add(dest) # Add to the result queue for instant or threaded processing. result_queue.put(result)
Hi Troels, Are you sure these changes to Gary's multi-processor code have the intended result? From my timings before and after this change, with the bug.py and bug.bz2 files attached to https://gna.org/bugs/?23618 and the command "mpirun -np 6 /data/relax/relax-trunk/relax -d --multi='mpi4py' bug.py", there are no real time differences. But that is probably because all my 8 CPU cores run at the same speed. Maybe a better test than MC simulations would be for a per-residue parallelisation where each calculation for each residue takes a different amount of time to complete. Does this work if the chunked operation is restored ( http://thread.gmane.org/gmane.science.nmr.relax.scm/25596/focus=7593 )? Note a few more points: - The xrange() function should not be used, as this kills the multi-processor on Python 3. - The print("\n") also introduces 2 newlines, which is probably not the intent here. - I find that seeing the running and idle set printed out in debugging mode to be very useful. - Maybe change "Running nr of jobs:..." to "Running jobs" to match the syntax of "Completed jobs". Gary might have some memory as to why the running set is not replenished until after all results in the set are complete. There might be other reasons for this behaviour. Cheers, Edward