New subject: test_fork1 on SMP? (was Re: [Python Dev] test_fork1 failing --with-threads (for some people)...)

July 30, 2000

      This message duplicates some of my previous messages about the
thread/fork deadlock.  (Received report that some of my earlier
messages appear to be lost.)

I'm running a simpler version of Neil's test program that uses the
thread module instead of the threading module.  It's attached at the
end of the message (test_fork2.py).  It spawns five threads; then each
thread, including the main thread, go into a while 1: loop that
forks.  The program locks up after about 100-200 forks.  The process
structure looks like this:
...
ps --forest
  PID TTY          TIME CMD
  684 pts/0    00:00:00 bash
  783 pts/0    00:00:00 python
  784 pts/0    00:00:00  \_ python
  785 pts/0    00:00:00  |   \_ python
  786 pts/0    00:00:00  |   \_ python
  951 pts/0    00:00:00  |   |   \_ python <defunct>
  787 pts/0    00:00:05  |   \_ python
  788 pts/0    00:00:04  |   \_ python
  953 pts/0    00:00:00  |   |   \_ python
  789 pts/0    00:00:00  |   \_ python
  952 pts/0    00:00:00  |       \_ python <defunct>
  950 pts/0    00:00:00  \_ python <defunct>
  964 pts/0    00:00:00 ps
top reports the following CPU activity:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
  787 jeremy    19   0  2048 2048  1940 R       0 52.8  0.7   0:06 python
  788 jeremy    17   0  2048 2048  1940 R       0 44.2  0.7   0:06 python

If I attached to process 787 with gdb, this is the stack trace I can see:

#0  pthread_cond_wait (cond=0x80f297c, mutex=0x80f2988) at queue.h:25
#1  0x806fee3 in PyThread_acquire_lock (lock=0x80f2978, waitflag=1)
    at ../../Python/thread_pthread.h:311
#2  0x8054ec1 in PyEval_RestoreThread (tstate=0x80fb2f0)
    at ../../Python/ceval.c:171
#3  0x80c1242 in posix_wait (self=0x0, args=0x80f1cac)
    at ../../Modules/posixmodule.c:2849
#4  0x8059b83 in call_builtin (func=0x8103578, arg=0x80f1cac, kw=0x0)
    at ../../Python/ceval.c:2369
#5  0x8059a41 in PyEval_CallObjectWithKeywords (func=0x8103578, arg=0x80f1cac, 
    kw=0x0) at ../../Python/ceval.c:2337
#6  0x80583b1 in eval_code2 (co=0x8144ec0, globals=0x80f25d4, locals=0x0, 
    args=0x80f1cb8, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, 
    owner=0x0) at ../../Python/ceval.c:1675
#7  0x805a01d in call_function (func=0x8144cdc, arg=0x80f1cac, kw=0x0)
    at ../../Python/ceval.c:2491
#8  0x8059a21 in PyEval_CallObjectWithKeywords (func=0x8144cdc, arg=0x80f1cac, 
    kw=0x0) at ../../Python/ceval.c:2335
#9  0x80b03aa in t_bootstrap (boot_raw=0x8142ad8)
    at ../../Modules/threadmodule.c:193
#10 0x4001ceca in pthread_start_thread (arg=0xbf3ffe60) at manager.c:213

After I collect the stack trace and restart the program, several other
processes start running again:

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
  786 jeremy    14   0  2048 2048  1928 R       0 24.6  0.7   0:09 python
  788 jeremy    15   0  2048 2048  1928 R       0 24.6  0.7   0:34 python
  787 jeremy    17   0  2048 2048  1928 R       0 24.4  0.7   0:34 python
  783 jeremy    15   0  2048 2048  1928 R       0 23.8  0.7   0:09 python

If I run the process again with thread debugging enabled (Py_DEBUG
defined at compiel time and THREADDEBUG set to 1), I get the following
trace:

PyThread_init_thread called
PyThread_allocate_lock() -> 0x80f2010
PyThread_acquire_lock(0x80f2010, 1) called
PyThread_acquire_lock(0x80f2010, 1) -> 1
PyThread_release_lock(0x80f2010) called
PyThread_allocate_lock called
PyThread_allocate_lock() -> 0x80f6c58
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
Adding parser accelerators ...
Done.
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_acquire_lock(0x80f6c58, 0) called
PyThread_acquire_lock(0x80f6c58, 0) -> 1
PyThread_release_lock(0x80f6c58) called
PyThread_allocate_lock called
PyThread_allocate_lock() -> 0x80f6498
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_start_new_thread called

[lots of stuff omitted]

2051 2127
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
1024 2130
PyThread_release_lock(0x80f6498) called
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_acquire_lock(0x80f6498, 1) called
PyThread_acquire_lock(0x80f6498, 1) -> 1
PyThread_release_lock(0x80f6498) called
[blocks]

In this case, the deadlock occurs much more quickly.  (Perhaps the
extra I/O produced by the debugging output exacerbates the problem.)

Jeremy

PS If you don't have access to a Linux machine, you can use one of the
Linux test machines on the SF compile farm.  Log into
linux.compile.sourceforge.net.

import os
import thread

def f():
    while 1:
        if os.fork() == 0:
            print "%s %s" % (thread.get_ident(), os.getpid())
            os._exit(0)
        os.wait()

thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
thread.start_new_thread(f, ())
f()

Re: [Python-Dev] test_fork1 on SMP? (was Re: [Python Dev] test_fork1 failing --with-threads (for some people)...)

Jeremy Hylton

Tim Peters

Neil Schemenauer

Guido van Rossum

Tim Peters

Neil Schemenauer

Guido van Rossum

tags

participants (4)