Python C/API based multithread python program locks
kanji
kanji_rama at yahoo.com
Tue Sep 28 20:21:21 EDT 2004
Hi ALL,
I have written a multithreaded python program where each thread calls
a C function
(via Python/C extension module) to execute some tasks on a remote
node. The number
of threads == the number of nodes specified by the user.
The issue is it works most of the time, but occassionally (I mean this
is quite random ) it hangs and it does not generate any errors as
such. While trying to debug, sometimes even the gdb hangs, but i
managed to get a backtrace of a hung thread:
#0 0xb75ebc32 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0xb75d11ee in pthread_cond_wait@@GLIBC_2.3.2 ()
from/lib/tls/libpthread.so.0
#2 0x0809bb3f in PyThread_acquire_lock ()
#3 0x0809e45c in _PyObject_GC_Del ()
#4 0x0807cad6 in PyEval_GetFuncDesc ()
#5 0x0807abc4 in PyEval_EvalCode ()
#6 0x0807b65e in PyEval_EvalCodeEx ()
#7 0x0807cbbb in PyEval_GetFuncDesc ()
#8 0x0807ab33 in PyEval_EvalCode ()
#9 0x0807b65e in PyEval_EvalCodeEx ()
#10 0x0807cbbb in PyEval_GetFuncDesc ()
#11 0x0807ab33 in PyEval_EvalCode ()
#12 0x0807b65e in PyEval_EvalCodeEx ()
#13 0x0807cbbb in PyEval_GetFuncDesc ()
#14 0x0807ab33 in PyEval_EvalCode ()
#15 0x0807b65e in PyEval_EvalCodeEx ()
#16 0x08078555 in PyEval_EvalCode ()
#17 0x08098569 in PyRun_FileExFlags ()
#18 0x080974d0 in PyRun_SimpleFileExFlags ()
#19 0x08096e1a in PyRun_AnyFileExFlags ()
#20 0x08053ac9 in Py_Main ()
#21 0x08053519 in main ()
So just to weed out the possibility that it is not because of some
error in the code, I iteratively called the same function (which
creates say 100 threads) in a for loop - for 500 times. I found that
it tends to hang at different iterations -- say may be at iteration
#480 or #12 or sometimes it sails smoothly.
in the python program -- the outputs from all threads are synchronized
via thread.join()
In the extension C srcs, i have used Py_BEGIN_ALLOW_THREADS and
Py_END_ALLOW_THREADS brackets to take care of GIL. I have separately
tested the C functions and it seemed to work fine.
Any ideas what could be the possible problem ? The test system is RHEL
3 and Python version 2.2.2
Please let me know if there any useful pointers to solve this issue.
Thanks
kanji
More information about the Python-list
mailing list