[Python-Dev] test_fork1 on SMP? (was Re: [Python Dev] test_fork1 failing --with-threads (for some people)...)
Jeremy Hylton
jeremy@beopen.com
Sun, 30 Jul 2000 12:30:18 -0400 (EDT)
--EY82Ois84t
Content-Type: text/plain; charset=us-ascii
Content-Description: message body text
Content-Transfer-Encoding: 7bit
This message duplicates some of my previous messages about the
thread/fork deadlock. (Received report that some of my earlier
messages appear to be lost.)
I'm running a simpler version of Neil's test program that uses the
thread module instead of the threading module. It's attached at the
end of the message (test_fork2.py). It spawns five threads; then each
thread, including the main thread, go into a while 1: loop that
forks. The program locks up after about 100-200 forks. The process
structure looks like this:
> ps --forest
PID TTY TIME CMD
684 pts/0 00:00:00 bash
783 pts/0 00:00:00 python
784 pts/0 00:00:00 \_ python
785 pts/0 00:00:00 | \_ python
786 pts/0 00:00:00 | \_ python
951 pts/0 00:00:00 | | \_ python <defunct>
787 pts/0 00:00:05 | \_ python
788 pts/0 00:00:04 | \_ python
953 pts/0 00:00:00 | | \_ python
789 pts/0 00:00:00 | \_ python
952 pts/0 00:00:00 | \_ python <defunct>
950 pts/0 00:00:00 \_ python <defunct>
964 pts/0 00:00:00 ps
top reports the following CPU activity:
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
787 jeremy 19 0 2048 2048 1940 R 0 52.8 0.7 0:06 python
788 jeremy 17 0 2048 2048 1940 R 0 44.2 0.7 0:06 python
If I attached to process 787 with gdb, this is the stack trace I can see:
#0 pthread_cond_wait (cond=0x80f297c, mutex=0x80f2988) at queue.h:25
#1 0x806fee3 in PyThread_acquire_lock (lock=0x80f2978, waitflag=1)
at ../../Python/thread_pthread.h:311
#2 0x8054ec1 in PyEval_RestoreThread (tstate=0x80fb2f0)
at ../../Python/ceval.c:171
#3 0x80c1242 in posix_wait (self=0x0, args=0x80f1cac)
at ../../Modules/posixmodule.c:2849
#4 0x8059b83 in call_builtin (func=0x8103578, arg=0x80f1cac, kw=0x0)
at ../../Python/ceval.c:2369
#5 0x8059a41 in PyEval_CallObjectWithKeywords (func=0x8103578, arg=0x80f1cac,
kw=0x0) at ../../Python/ceval.c:2337
#6 0x80583b1 in eval_code2 (co=0x8144ec0, globals=0x80f25d4, locals=0x0,
args=0x80f1cb8, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0,
owner=0x0) at ../../Python/ceval.c:1675
#7 0x805a01d in call_function (func=0x8144cdc, arg=0x80f1cac, kw=0x0)
at ../../Python/ceval.c:2491
#8 0x8059a21 in PyEval_CallObjectWithKeywords (func=0x8144cdc, arg=0x80f1cac,
kw=0x0) at ../../Python/ceval.c:2335
#9 0x80b03aa in t_bootstrap (boot_raw=0x8142ad8)
at ../../Modules/threadmodule.c:193
#10 0x4001ceca in pthread_start_thread (arg=0xbf3ffe60) at manager.c:213
After I collect the stack trace and restart the program, several other
processes start running again:
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
786 jeremy 14 0 2048 2048 1928 R 0 24.6 0.7 0:09 python
788 jeremy 15 0 2048 2048 1928 R 0 24.6 0.7 0:34 python
787 jeremy 17 0 2048 2048 1928 R 0 24.4 0.7 0:34 python
783 jeremy 15 0 2048 2048 1928 R 0 23.8 0.7 0:09 python
If I run the process again with thread debugging enabled (Py_DEBUG
defined at compiel time and THREADDEBUG set to 1), I get the following
trace:
PyThread_init_thread called
PyThread_allocate_lock() -> 0x80f2010
PyThread_acquire_lock(0x80f2010, 1) called
PyThread_acquire_lock(0x80f2010, 1) -> 1
PyThread_release_lock(0x80f2010) called
PyThread_allocate_lock called
PyThread_allocate_lock() -> 0x80f6c58
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
Adding parser accelerators ...
Done.
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_allocate_lock called
PyThread_allocate_lock() -> 0x80f6498
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_start_new_thread called
[lots of stuff omitted]
2051 2127
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
1024 2130
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
[blocks]
In this case, the deadlock occurs much more quickly. (Perhaps the
extra I/O produced by the debugging output exacerbates the problem.)
Jeremy
PS If you don't have access to a Linux machine, you can use one of the
Linux test machines on the SF compile farm. Log into
linux.compile.sourceforge.net.
--EY82Ois84t
Content-Type: text/plain
Content-Description: test_fork2.py
Content-Disposition: inline;
filename="test_fork2.py"
Content-Transfer-Encoding: 7bit
import os
import thread
def f():
while 1:
if os.fork() == 0:
print "%s %s" % (thread.get_ident(), os.getpid())
os._exit(0)
os.wait()
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
f()
--EY82Ois84t--