[Python-Dev] test_fork1 on SMP? (was Re: [Python Dev] test_fork1 failing --with-threads (for some people)...)
Thu, 27 Jul 2000 20:32:00 -0400
> For both bugs, though, a mutex and a condition variable are being use:
Oh ya -- now that you mention it, I wrote that code <wink> -- but more than
7 years ago! How could a failure have gone undetected for so long?
> The interpreter lock is being acquired and released in both cases.
> My current theory is that Python isn't dealing with the interpreter
> lock correctly across a fork. If some thread other than the one
> calling fork holds the interpreter lock mutex,
Let's flesh out the most likely bad case:
the main thread gets into posix_fork
one of the spawned threads (say, thread 1) tries to acquire the
thread 1 gets into PyThread_acquire_lock
thread 1 grabs the pthread mutex guarding "the global lock"
the main thread executes fork() while thread 1 holds the mutex
in the original process, everything's still cool: thread 1 still
exists there, and it releases the mutex it acquired (after seeing
that the "is it locked?" flag is set), yadda yadda yadda.
but in the forked process, things are not cool: the (cloned) mutex
guarding the global lock is still held
What happens next in the child process is interesting <wink>: there is only
one thread in the child process, and it's still in posix_fork. There it
sets the main_thread and main_pid globals, and returns to the interpreter
loop. That the forked pthread_mutex is still locked is irrelevant at this
point: the child process won't care about that until enough bytecodes pass
that its sole thread offers to yield. It doesn't bash into the
already-locked cloned pthread mutex until it executes PyThread_release_lock
as part of offering to yield. Then the child hangs. Don't know about this
specific implementation, but phtread mutex acquires were usually implemented
as busy-loops in my day (which is one reason Python locks were *not* modeled
directly as pthread mutexes).
So, in this scenario, the child hangs in a busy loop after an accidental
amount of time passes after the fork.
Matches your symptoms? It doesn't match Trent's segfault, but one nightmare
at a time ...