linux - High Kernel CPU when running multiple python progams -
i developed python program heavy numerical calculations. run on linux machine 32 xeon cpus, 64gb ram, , ubuntu 14.04 64-bit. launch multiple python instances different model parameters in parallel use multiple processes without having worry global interpreter lock (gil). when monitor cpu utilization using htop
, see cores used, of time kernel. generally, kernel time more twice user time. i'm afraid there lot of overhead going on on system level, i'm not able find cause this.
how 1 reduce high kernel cpu usage?
here observation made:
- this effect appears independent of whether run 10 jobs or 50. if there fewer jobs cores, not cores used, ones used still have high cpu usage kernel
- i implemented inner loop using numba, problem not related this, since removing numba part not resolve problem
- i though might related using python2 similar problem mentioned in question switching python2 python3 did not change much
- i measured total number of context switches performed os, 10000 per second. i'm not sure whether large number
- i tried increasing python time slices setting
sys.setcheckinterval(10000)
(for python2) ,sys.setswitchinterval(10)
(for python3) none of helped - i tried influencing task scheduler running
schedtool -b pid
didn't help
edit: here screenshot of htop
:
i ran perf record -a -g
, report perf report -g graph
:
samples: 1m of event 'cycles', event count (approx.): 1114297095227 - 95.25% python3 [kernel.kallsyms] [k] _raw_spin_lock_irqsave ◆ - _raw_spin_lock_irqsave ▒ - 95.01% extract_buf ▒ extract_entropy_user ▒ urandom_read ▒ vfs_read ▒ sys_read ▒ system_call_fastpath ▒ __gi___libc_read ▒ - 2.06% python3 [kernel.kallsyms] [k] sha_transform ▒ - sha_transform ▒ - 2.06% extract_buf ▒ extract_entropy_user ▒ urandom_read ▒ vfs_read ▒ sys_read ▒ system_call_fastpath ▒ __gi___libc_read ▒ - 0.74% python3 [kernel.kallsyms] [k] _mix_pool_bytes ▒ - _mix_pool_bytes ▒ - 0.74% __mix_pool_bytes ▒ extract_buf ▒ extract_entropy_user ▒ urandom_read ▒ vfs_read ▒ sys_read ▒ system_call_fastpath ▒ __gi___libc_read ▒ 0.44% python3 [kernel.kallsyms] [k] extract_buf ▒ 0.15% python3 python3.4 [.] 0x000000000004b055 ▒ 0.10% python3 [kernel.kallsyms] [k] memset ▒ 0.09% python3 [kernel.kallsyms] [k] copy_user_generic_string ▒ 0.07% python3 multiarray.cpython-34m-x86_64-linux-gnu.so [.] 0x00000000000b4134 ▒ 0.06% python3 [kernel.kallsyms] [k] _raw_spin_unlock_irqresto▒ 0.06% python3 python3.4 [.] pyeval_evalframeex
it seems if of time spent calling _raw_spin_lock_irqsave
. have no idea means, though.
if problem exists in kernel, should narrow down problem using profiler such oprofile or perf.
i.e. run perf record -a -g
, read profiling data saved perf data
using perf report
. see also: linux perf: how interpret , find hotspots.
in case high cpu usage caused competition /dev/urandom
-- allows 1 thread read it, multiple python processes doing so.
python module random
using initialization. i.e:
$ strace python -c 'import random; while true: random.random()' open("/dev/urandom", o_rdonly) = 4 read(4, "\16\36\366\36}"..., 2500) = 2500 close(4) <--- /dev/urandom closed
you may explicitly ask /dev/urandom
using os.urandom
or systemrandom
class. check code dealing random numbers.
Comments
Post a Comment