On 27 May 2015 at 03:35, Troels E. Linnet <NO-REPLY.INVALID-ADDRESS@xxxxxxx> wrote:
Follow-up Comment #5, bug #23618 (project relax): It is weird, that when calculations is not submitted, slave processors shows 100 %, and the master does it all. 13578 tlinnet 20 0 1315m 386m 25m R 133.6 1.6 1:08.73 python 13584 tlinnet 20 0 784m 72m 21m R 100.2 0.3 1:03.22 python 13579 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.23 python 13580 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.25 python 13581 tlinnet 20 0 784m 73m 21m R 99.9 0.3 1:03.20 python 13582 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.22 python 13583 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.21 python 13585 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.21 python 13586 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.23 python 13587 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.24 python 13588 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.20 python 13589 tlinnet 20 0 784m 72m 21m R 99.9 0.3 1:03.20 python
Note that this is how by default OpenMPI operates. The master and all slaves are always run at 100%, even when they are idle. They are actually using 100% of the CPU to continually poll the queues. The OpenMPI people chose this behaviour to minimise data transfer bottlenecks, and is based on the assumption that all of the calculation time will be parallelised and that you will have full access to the nodes you are allocated.
When jobs are submitted, they show 200 % 13579 tlinnet 20 0 1023m 150m 23m R 199.9 0.6 3:20.01 python 13580 tlinnet 20 0 1023m 152m 23m R 199.9 0.6 3:19.77 python 13582 tlinnet 20 0 1023m 152m 23m R 199.9 0.6 3:19.22 python 13583 tlinnet 20 0 1023m 149m 23m R 199.9 0.6 3:18.93 python 13584 tlinnet 20 0 1023m 151m 23m R 199.9 0.6 3:18.40 python 13585 tlinnet 20 0 1023m 149m 23m R 199.9 0.6 3:18.12 python 13586 tlinnet 20 0 1023m 149m 23m R 199.9 0.6 3:17.89 python 13588 tlinnet 20 0 1023m 151m 23m R 199.9 0.6 3:17.32 python 13589 tlinnet 20 0 1023m 151m 23m R 199.9 0.6 3:16.74 python 13587 tlinnet 20 0 1023m 151m 23m R 199.5 0.6 3:17.60 python 13581 tlinnet 20 0 1023m 150m 23m R 199.2 0.6 3:19.40 python 13578 tlinnet 20 0 1638m 636m 26m R 99.9 2.6 3:49.60 python
The 200% is a little strange, but OpenMPI CPU percentage numbers have been strange on Linux for a long time now. I've seen nodes at 200%, and sometimes at 50%. I don't know what this is about - if it is an OpenMPI bug, a Linux kernel reporting bug, or if relax is doing something strange (I doubt it's the last option, as a Google search will show that others have encountered such strangeness). Regards, Edward