mailRe: [bug #23618] queuing system for multi processors is not well designed.


Others Months | Index by Date | Thread Index
>>   [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Header


Content

Posted by Edward d'Auvergne on June 08, 2015 - 18:12:
On 27 May 2015 at 03:35, Troels E. Linnet
<NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
Follow-up Comment #5, bug #23618 (project relax):

It is weird, that when calculations is not submitted, slave processors shows
100 %, and the master does it all.

13578 tlinnet   20   0 1315m 386m  25m R 133.6  1.6   1:08.73 python

13584 tlinnet   20   0  784m  72m  21m R 100.2  0.3   1:03.22 python

13579 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.23 python

13580 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.25 python

13581 tlinnet   20   0  784m  73m  21m R 99.9  0.3   1:03.20 python

13582 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.22 python

13583 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.21 python

13585 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.21 python

13586 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.23 python

13587 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.24 python

13588 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.20 python

13589 tlinnet   20   0  784m  72m  21m R 99.9  0.3   1:03.20 python

Note that this is how by default OpenMPI operates.  The master and all
slaves are always run at 100%, even when they are idle.  They are
actually using 100% of the CPU to continually poll the queues.  The
OpenMPI people chose this behaviour to minimise data transfer
bottlenecks, and is based on the assumption that all of the
calculation time will be parallelised and that you will have full
access to the nodes you are allocated.


When jobs are submitted, they show 200 %

13579 tlinnet   20   0 1023m 150m  23m R 199.9  0.6   3:20.01 python

13580 tlinnet   20   0 1023m 152m  23m R 199.9  0.6   3:19.77 python

13582 tlinnet   20   0 1023m 152m  23m R 199.9  0.6   3:19.22 python

13583 tlinnet   20   0 1023m 149m  23m R 199.9  0.6   3:18.93 python

13584 tlinnet   20   0 1023m 151m  23m R 199.9  0.6   3:18.40 python

13585 tlinnet   20   0 1023m 149m  23m R 199.9  0.6   3:18.12 python

13586 tlinnet   20   0 1023m 149m  23m R 199.9  0.6   3:17.89 python

13588 tlinnet   20   0 1023m 151m  23m R 199.9  0.6   3:17.32 python

13589 tlinnet   20   0 1023m 151m  23m R 199.9  0.6   3:16.74 python

13587 tlinnet   20   0 1023m 151m  23m R 199.5  0.6   3:17.60 python

13581 tlinnet   20   0 1023m 150m  23m R 199.2  0.6   3:19.40 python

13578 tlinnet   20   0 1638m 636m  26m R 99.9  2.6   3:49.60 python

The 200% is a little strange, but OpenMPI CPU percentage numbers have
been strange on Linux for a long time now.  I've seen nodes at 200%,
and sometimes at 50%.  I don't know what this is about - if it is an
OpenMPI bug, a Linux kernel reporting bug, or if relax is doing
something strange (I doubt it's the last option, as a Google search
will show that others have encountered such strangeness).

Regards,

Edward



Related Messages


Powered by MHonArc, Updated Thu Jun 11 13:40:21 2015