multithreading - Python / OpenCV application lockup issue -

my python application running on 64-core linux box runs without problem. after random length of time (around 0.5 1.5 days usually) start getting frequent pauses/lockups of on 10 seconds! during these lockups system cpu time (i.e. time in kernel) can on 90% (yes: 90% of 64 cores, not of 1 cpu).

my app restarted throughout day. restarting app not fix problem. however, rebooting machine does.

question 1: cause 90% system cpu time 10 seconds? of system cpu time in parent python process, not in child processes created through python's multiprocessing or other processes. means of order of 60+ threads spending 10+ seconds in kernel. not sure if python issue or linux kernel issue.

question 2: reboot fixes problem must big clue cause. linux resources left exhausted on system between app restarting, not between reboots, cause problem stuck on?

what i've tried far solve / figure out

below mention multiprocessing lot. that's because application runs in cycle , multiprocessing used in 1 part of cycle. high cpu happens after multiprocessing calls finish. i'm not sure if hint @ cause or red herring.

my app runs thread uses psutil log out process , system cpu stats every 0.5 seconds. have independently confirmed it's reporting top.
i've converted app python 2.7 python 3.4 because python 3.2 got new gil implementation , 3.4 had multiprocessing rewritten. while improved things did not solve problem (see my previous question i'm leaving because it's still useful answer, if not total answer).
i have replaced os. ubuntu 12 lts, it's centos 7. no difference.
it turns out multithreading , multiprocessing clash in python/linux , not recommended together, python 3.4 has forkserver , spawn multiprocessing contexts. i've tried them, no difference.
i've checked /dev/shm see if i'm running out of shared memory (which python 3.4 uses manage multiprocessing), nothing
lsof output listing resource here
it's difficult test on other machines because run multiprocess pool of 59 children , don't have other 64 core machines lying around
i can't run using threads rather processes because can't run fast enough due gil (hence why switched multiprocessing in first place)
i've tried using strace on 1 thread running slow (it can't run across threads because slows app far much). below got doesn't tell me much.
ltrace not work because can't use -p on thread id. running on main thread (no -f) makes app slow problem doesn't show up.
the problem not related load. run fine @ full load, , later @ half load, it'll problem.
even if reboot machine nightly problem comes every couple of days.

environment / notes:

python 3.4.3 compiled source
centos 7 totally date. uname -a: linux 3.10.0-229.4.2.el7.x86_64 #1 smp wed may 13 10:06:09 utc 2015 x86_64 x86_64 x86_64 gnu/linux (although kernel update applied today)
machine has 128gb of memory , has plenty free
i use numpy linked atlas. i'm aware openblas clashes python multiprocessing atlas not, , clash solved python 3.4's forkserver , spawn i've tried.
i use opencv lot of parallel work
i use ctypes access c .so library provided camera manufacturer
app runs root (a requirement of c library link to)
the python multiprocessing pool created in code guarded if __name__ == "__main__": , in main thread

updated strace results

a few times i've managed strace thread ran @ 100% 'system' cpu. once have gotten meaningful out of it. see below call @ 10:24:12.446614 takes 1.4 seconds. given it's same id (0x7f05e4d1072c) see in other calls guess python's gil synchronisation. guess make sense? if so, question why wait take 1.4 seconds? not releasing gil?

10:24:12.375456 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.000823> 10:24:12.377076 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.002419> 10:24:12.379588 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.001898> 10:24:12.382324 sched_yield()           = 0 <0.000186> 10:24:12.382596 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.004023> 10:24:12.387029 sched_yield()           = 0 <0.000175> 10:24:12.387279 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.054431> 10:24:12.442018 sched_yield()           = 0 <0.000050> 10:24:12.442157 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.003902> 10:24:12.446168 futex(0x7f05e4d1022c, futex_wake, 1) = 1 <0.000052> 10:24:12.446316 futex(0x7f05e4d11cac, futex_wake, 1) = 1 <0.000056> 10:24:12.446614 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <1.439739> 10:24:13.886513 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.002381> 10:24:13.889079 sched_yield()           = 0 <0.000016> 10:24:13.889135 sched_yield()           = 0 <0.000049> 10:24:13.889244 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.032761> 10:24:13.922147 sched_yield()           = 0 <0.000020> 10:24:13.922285 sched_yield()           = 0 <0.000104> 10:24:13.923628 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.002320> 10:24:13.926090 sched_yield()           = 0 <0.000018> 10:24:13.926244 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.000265> 10:24:13.926667 sched_yield()           = 0 <0.000027> 10:24:13.926775 sched_yield()           = 0 <0.000042> 10:24:13.926964 futex(0x7f05e4d1072c, futex_wait, 2, null) = -1 eagain (resource temporarily unavailable) <0.000117> 10:24:13.927241 futex(0x7f05e4d110ac, futex_wake, 1) = 1 <0.000099> 10:24:13.927455 futex(0x7f05e4d11d2c, futex_wake, 1) = 1 <0.000186> 10:24:13.931318 futex(0x7f05e4d1072c, futex_wait, 2, null) = 0 <0.000678>

i've managed thread dump gdb right @ point 40+ threads showing 100% 'system' cpu time.

here's backtrace same every 1 of threads:

#0  0x00007fffebe9b407 in cv::thresholdrunner::operator()(cv::range const&) const () /usr/local/lib/libopencv_imgproc.so.3.0 #1  0x00007fffecfe44a0 in tbb::interface6::internal::start_for<tbb::blocked_range<int>, (anonymous namespace)::proxyloopbody, tbb::auto_partitioner const>::execute() () /usr/local/lib/libopencv_core.so.3.0 #2  0x00007fffe967496a in tbb::internal::custom_scheduler<tbb::internal::intelschedulertraits>::local_wait_for_all(tbb::task&, tbb::task*) () /lib64/libtbb.so.2 #3  0x00007fffe96705a6 in tbb::internal::arena::process(tbb::internal::generic_scheduler&) () /lib64/libtbb.so.2 #4  0x00007fffe966fc6b in tbb::internal::market::process(rml::job&) () /lib64/libtbb.so.2 #5  0x00007fffe966d65f in tbb::internal::rml::private_worker::run() () /lib64/libtbb.so.2 #6  0x00007fffe966d859 in tbb::internal::rml::private_worker::thread_routine(void*) () /lib64/libtbb.so.2 #7  0x00007ffff76e9df5 in start_thread () /lib64/libpthread.so.0 #8  0x00007ffff6d0e1ad in clone () /lib64/libc.so.6

my original question put python , linux front , center issue appears lie tbb and/or opencv. since opencv tbb used presume has involve interplay specific environment somehow. maybe because it's 64 core machine?

i have recompiled opencv tbb turned off , problem has not reappeared far. app runs slower.

i have posted bug opencv , update answer comes that.

Search This Blog

Lix