[Python-Dev] pthreads, fork, import, and execvp

Thomas Wouters thomas at python.org
Tue Jul 21 02:16:30 CEST 2009


On Mon, Jul 20, 2009 at 11:26, Mike Klaas <mike.klaas at gmail.com> wrote:

>
>
> On Thu, Jul 16, 2009 at 1:08 PM, Thomas Wouters <thomas at python.org> wrote:
>
>>
>> Picking up a rather old discussion... We encountered this bug at Google
>> and I'm now "incentivized" to fix it.
>>
>> For a short recap: Python has an import lock that prevents more than one
>> thread from doing an import at any given time. However, unlike most of the
>> locks we have lying around, we don't clear that lock in the child after an
>> os.fork(). That means that doing an os.fork() during an import means the
>> child process can't do any other imports. It also means that doing an
>> os.fork() *while another thread is doing an import* means the child process
>> can't do any other imports.
>>
>> Since this three-year-old discussion we've added a couple of
>> post-fork-cleanups to CPython (the TLS, the threading module's idea of
>> active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do
>> simply discard the memory for other locks held during fork (the GIL, see
>> Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in
>> Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock,
>> except when the platform is AIX. I don't see any particular reason why we
>> aren't doing the same thing to the import lock that we do to the other
>> locks, on all platforms. It's a quick fix for a real problem (see
>> http://bugs.python.org/issue1590864 and
>> http://bugs.python.org/issue1404925 for two bugreports that seem to be
>> this very issue.)
>>
>
> +1.  We were also affected by this bug, getting sporatic deadlocks in a
> multi-threaded program that fork()s subprocesses to do processing.
>  It took a while to figure out what was going on.
>

Actually, after careful consideration I have come to realize that simply
resetting the lock is exactly the wrong thing to do.

The import lock exists to prevent threads from getting a half-initialized
module *while another module is creating the module*. That is to say, if
threads 'A' and 'B' both import 'mod', it is not only important that threads
A and B don't *both* try to execute mod.py, but also that thread B doesn't
get a half-initialized module object that thread A is busy populating.

if the import lock is held by the thread doing fork(), this is not a
problem: the module is still being imported in the child process, and the
fork doesn't affect it. If the import lock is held by another thread than
the one doing fork(), any partially imported modules would remain in
sys.modules in the child process, without ever getting finished.

So we need to actually acquire the import lock before forking. We can do
that in os.fork() and os.forkpty(), but we can't fix third-party extension
modules; we'll have to introduce a new set of API functions, for getting and
releasing the import lock. I would suggest we don't expose it as that, but
instead call it a fork lock or such, so we can add extra lock
acquire/release pairs as necessary.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090720/1dd82f4a/attachment.htm>


More information about the Python-Dev mailing list