pthreads, fork, import, and execvp
Hello everyone! We have been encountering several deadlocks in a threaded Python application which calls subprocess.Popen (i.e. fork()) in some of its threads. This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel. Preliminary analysis of the hang shows that the child process blocks upon entering the execvp function, in which the import_lock is acquired due to the following line: def _ execvpe(file, args, env=None): from errno import ENOENT, ENOTDIR ... It is known that when forking from a pthreaded application, acquisition attempts on locks which were already locked by other threads while fork() was called will deadlock. Due to these oddities we were wondering if it would be better to extract the above import line from the execvpe call, to prevent lock acquisition attempts in such cases. Another workaround could be re-assigning a new lock to import_lock (such a thing is done with the global interpreter lock) at PyOS_AfterFork or pthread_atfork. We'd appreciate any opinions you might have on the subject. Thanks in advance, Yair and Rotem
Yeah, I think imports inside functions are overused. On 5/9/06, Rotem Yaari <vmalloc@gmail.com> wrote:
Hello everyone!
We have been encountering several deadlocks in a threaded Python application which calls subprocess.Popen (i.e. fork()) in some of its threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks upon entering the execvp function, in which the import_lock is acquired due to the following line:
def _ execvpe(file, args, env=None): from errno import ENOENT, ENOTDIR ...
It is known that when forking from a pthreaded application, acquisition attempts on locks which were already locked by other threads while fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract the above import line from the execvpe call, to prevent lock acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock (such a thing is done with the global interpreter lock) at PyOS_AfterFork or pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Rotem Yaari wrote:
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks upon entering the execvp function, in which the import_lock is acquired due to the following line:
def _ execvpe(file, args, env=None): from errno import ENOENT, ENOTDIR ...
While I agree there's no good reason to embed this import inside the function, I also don't see how you could be getting a deadlock unless you have modules that either spawn new threads or fork the process as a side effect of being imported. Neither of those is guaranteed to work properly due to the fact that the original thread holds the import lock, leading to the possibility of deadlock if the spawned thread tries to import anything (as seems to be happening here) and the original thread then waits for a result from the spawned thread. Spawning new threads or processes can really only be safely done when invoked directly or indirectly from the __main__ module (as that is run without holding the import lock). When done as a side effect of module import, the module needs to ensure that the result of the spawned thread or process is only waited for after the original module import is complete (and the import lock released as a result). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Guido van Rossum wrote:
Yeah, I think imports inside functions are overused.
Rotem took the time to explain the problem to me more fully in private email, and it appears that using import statements inside functions is currently fundamentally unsafe in a posix environment where both multiple threads and fork() are used. If any thread other than the one invoking fork() holds the import lock at the time of the fork then import statements in the spawned process will deadlock. So I believe fixing this properly would indeed require assigning a new lock object in one of the two places Rotem suggested. Cheers, Nick.
On 5/9/06, Rotem Yaari <vmalloc@gmail.com> wrote:
Hello everyone!
We have been encountering several deadlocks in a threaded Python application which calls subprocess.Popen (i.e. fork()) in some of its threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks upon entering the execvp function, in which the import_lock is acquired due to the following line:
def _ execvpe(file, args, env=None): from errno import ENOENT, ENOTDIR ...
It is known that when forking from a pthreaded application, acquisition attempts on locks which were already locked by other threads while fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract the above import line from the execvpe call, to prevent lock acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock (such a thing is done with the global interpreter lock) at PyOS_AfterFork or pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Nick Coghlan wrote:
Rotem took the time to explain the problem to me more fully in private email, and it appears that using import statements inside functions is currently fundamentally unsafe in a posix environment where both multiple threads and fork() are used. If any thread other than the one invoking fork() holds the import lock at the time of the fork then import statements in the spawned process will deadlock.
So I believe fixing this properly would indeed require assigning a new lock object in one of the two places Rotem suggested.
I still don't understand the problem fully. According to POSIX, fork does: "A process is created with a single thread. If a multi-threaded process calls fork(), the new process contains a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources." So if the the import lock was held during the fork call, the new thread will hold the import lock of the new process, and subsequent imports will block. However, then the problem is not with the execve implementation, but with the fact that the import lock was held when the process forked. Rotem should simply avoid to fork() in the toplevel code of a module. This problem should be fixed in Python 2.5, whose PyOS_AfterFork now reads void PyOS_AfterFork(void) { #ifdef WITH_THREAD PyEval_ReInitThreads(); main_thread = PyThread_get_thread_ident(); main_pid = getpid(); _PyImport_ReInitLock(); #endif } This was added in ------------------------------------------------------------------------ r39522 | gvanrossum | 2005-09-14 20:09:42 +0200 (Mi, 14 Sep 2005) | 4 lines - Changes donated by Elemental Security to make it work on AIX 5.3 with IBM's 64-bit compiler (SF patch #1284289). This also closes SF bug #105470: test_pwd fails on 64bit system (Opteron). ------------------------------------------------------------------------ Unfortunately, that fix would apply to AIX only, though: void _PyImport_ReInitLock(void) { #ifdef _AIX if (import_lock != NULL) import_lock = PyThread_allocate_lock(); #endif } If we want to support fork while an import is going on, we should release the import lock if it is held. Alternatively, the code could throw away the old import lock on all systems; that seems like a waste of resources to me, though. One should also reset import_lock_thread and import_lock_level. I'd be curious if Elemental Security added this code only to solve the very same problem, i.e. a fork that happened with the import lock held. Regards, Martin
On 5/16/06, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Rotem should simply avoid to fork() in the toplevel code of a module.
this is not what happened. we called subprocess which itself called fork. as a subprocess bug this could be easily fixed by moving the import in "os._execve". we have made that fix locally and it fixes the problem. cheers
Martin v. Löwis wrote:
So if the the import lock was held during the fork call, the new thread will hold the import lock of the new process, and subsequent imports will block.
However, then the problem is not with the execve implementation, but with the fact that the import lock was held when the process forked.
Rotem should simply avoid to fork() in the toplevel code of a module.
That's what I originally thought, but Rotem was able to clarify for me that the problem occurs even if the import lock is held by a *different* thread at the time fork() is called. So a deadlock can be triggered by perfectly legitimate code that has the misfortune to fork the process while another thread is doing a module import.
Unfortunately, that fix would apply to AIX only, though:
void _PyImport_ReInitLock(void) { #ifdef _AIX if (import_lock != NULL) import_lock = PyThread_allocate_lock(); #endif }
If we want to support fork while an import is going on, we should release the import lock if it is held. Alternatively, the code could throw away the old import lock on all systems; that seems like a waste of resources to me, though. One should also reset import_lock_thread and import_lock_level.
Broadening the ifdef to cover all posix systems rather than just AIX might be the right thing to do. However, the first thing would be to reproduce the deadlock reliably. A repeatable test case would look something like this: class ForkingThread(Thread): def run(self): # create a subprocess using subprocess.Popen # and ensure it tries to do an import operation # (i.e. something similar to what Rotem & Yair's code is doing) class ImportingThread(Thread): def __init__(self): Thread.__init__(self) self.importing = Event() self.cleanup = Event() def run(self): imp.acquire_lock() try: self.importing.set() self.cleanup.wait() finally: imp.release_lock() def test_forked_import_deadlock(): import_thread = ImportingThread() fork_thread = ForkingThread() import_thread.start() try: import_thread.importing.wait() fork_thread.start() # Fork while the import thread holds the lock fork_thread.join(10) # Give the subprocess time to finish assert not fork_thread.isAlive() # A timeout means it deadlocked finally: import_thread.cleanup.set() # Release the import lock Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
Yair Chuchem wrote:
Rotem should simply avoid to fork() in the toplevel code of a module.
this is not what happened. we called subprocess which itself called fork.
Ok - but are you calling subprocess in the context of code that is just being imported? Or is the caller of the code that calls subprocess just being imported? I.e. if you would trace back the stack at the point of fork, would there be an import statement on any frame of the stack (no matter how deeply nested)?
as a subprocess bug this could be easily fixed by moving the import in "os._execve". we have made that fix locally and it fixes the problem.
I can readily believe that the problem went away. However, you did not fix it - you worked around it. The import lock *ought* to be available inside the execve code, so the import *ought* to work. By moving the import outside execve, you have only hidden the problem, instead of solving it. Regards, Martin
Nick Coghlan wrote:
Broadening the ifdef to cover all posix systems rather than just AIX might be the right thing to do.
As I said: I doubt it is the right thing to do. Instead, the lock should get released in the child process if it is held at the point of the fork. I agree that we should find a repeatable test case first. Contributions are welcome. Regards, Martin
On 5/16/06, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Nick Coghlan wrote:
Broadening the ifdef to cover all posix systems rather than just AIX might be the right thing to do.
As I said: I doubt it is the right thing to do. Instead, the lock should get released in the child process if it is held at the point of the fork.
I agree that we should find a repeatable test case first.
I'm not sure if this bug can be reproduced now, but the comment in warnings.py points to: http://python.org/sf/683658 I didn't investigate it further. That might allow someone to create a reproducible test case. n
Neal Norwitz wrote:
As I said: I doubt it is the right thing to do. Instead, the lock should get released in the child process if it is held at the point of the fork.
I'm not sure if this bug can be reproduced now, but the comment in warnings.py points to:
I didn't investigate it further. That might allow someone to create a reproducible test case.
That's a different problem, though. Here, the main thread waits for a child *thread* while holding the import lock. I don't quite understand the test case given, and I can't make it deadlock anymore (probably because of Just's fix). Regards, Martin
Martin v. Löwis wrote:
Neal Norwitz wrote:
As I said: I doubt it is the right thing to do. Instead, the lock should get released in the child process if it is held at the point of the fork.
I'm not sure if this bug can be reproduced now, but the comment in warnings.py points to:
I didn't investigate it further. That might allow someone to create a reproducible test case.
That's a different problem, though. Here, the main thread waits for a child *thread* while holding the import lock. I don't quite understand the test case given, and I can't make it deadlock anymore (probably because of Just's fix).
And if I understand it correctly, it falls under the category that waiting for another thread while holding the import lock is a *really* bad idea from a thread safety point of view. The thing with the import-after-fork deadlock is that you can trigger it without even doing anything that's known not to be thread-safe. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
On 16-mei-2006, at 23:20, Martin v. Löwis wrote:
Nick Coghlan wrote:
Broadening the ifdef to cover all posix systems rather than just AIX might be the right thing to do.
As I said: I doubt it is the right thing to do. Instead, the lock should get released in the child process if it is held at the point of the fork.
I agree that we should find a repeatable test case first. Contributions are welcome.
From what I understand the problem is that when a thread other than the thread calling fork is holding the import lock the child process ends up with an import lock that is locked by a no longer existing thread. The attached script reproduces the problem for me. This creates a thread to imports a module and a thread that forks. Run 'forkimport.py' to demonstrate the problem, if you set 'gHang' at the start of the script to 0 the child process doesn't perform an import and terminates normally. Wouldn't this specific problem be fixed if os.fork were to acquire the import lock before calling fork(2)? Ronald
Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ ronaldoussoren%40mac.com
Nick Coghlan wrote:
And if I understand it correctly, it falls under the category that waiting for another thread while holding the import lock is a *really* bad idea from a thread safety point of view.
The thing with the import-after-fork deadlock is that you can trigger it without even doing anything that's known not to be thread-safe.
Right. With some googling, I found that one solution is pthread_atexit: a pthread_atexit handler is a triple (before, in_parent, in_child). You set it to (acquire, release, release). When somebody forks, the pthread library will first acquire the import lock in the thread that wants to fork. Then the fork occurs, and the import lock gets then released both in the parent and in the child. I would like to see this approach implemented, but I agree with you that a test case should be created first. Regards, Martin
Ronald Oussoren wrote:
Wouldn't this specific problem be fixed if os.fork were to acquire the import lock before calling fork(2)?
Right. Of course, you need to release the lock then both in the parent and the child. Also, all places that currently do PyOs_AfterFork should get modified (i.e. fork1, and forkpty also). Regards, Martin
On Thu, May 18, 2006 at 20:02, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Nick Coghlan wrote:
And if I understand it correctly, it falls under the category that waiting for another thread while holding the import lock is a *really* bad idea from a thread safety point of view.
The thing with the import-after-fork deadlock is that you can trigger it without even doing anything that's known not to be thread-safe.
Right. With some googling, I found that one solution is pthread_atexit: a pthread_atexit handler is a triple (before, in_parent, in_child). You set it to (acquire, release, release). When somebody forks, the pthread library will first acquire the import lock in the thread that wants to fork. Then the fork occurs, and the import lock gets then released both in the parent and in the child.
I would like to see this approach implemented, but I agree with you that a test case should be created first.
Picking up a rather old discussion... We encountered this bug at Google and I'm now "incentivized" to fix it. For a short recap: Python has an import lock that prevents more than one thread from doing an import at any given time. However, unlike most of the locks we have lying around, we don't clear that lock in the child after an os.fork(). That means that doing an os.fork() during an import means the child process can't do any other imports. It also means that doing an os.fork() *while another thread is doing an import* means the child process can't do any other imports. Since this three-year-old discussion we've added a couple of post-fork-cleanups to CPython (the TLS, the threading module's idea of active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do simply discard the memory for other locks held during fork (the GIL, see Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock, except when the platform is AIX. I don't see any particular reason why we aren't doing the same thing to the import lock that we do to the other locks, on all platforms. It's a quick fix for a real problem (see http://bugs.python.org/issue1590864 and http://bugs.python.org/issue1404925 for two bugreports that seem to be this very issue.) It also seems to me, since we have two or three (depending on how you count) uses for it, that we should either add a way to free the old locks (to the various threading implementations) or add a way to use pthread_atexit portably -- or at least portably across systems with fork(). I don't think we should wait with fixing the big issue (making threads and fork() being unnecessarily flakey) until we have a good fix for the small issue (a tiny bit of a memory leak after a fork() -- in the child process only.) Especially since the good fix for the small issue might require co-ordination between all the threading implementations we have and nobody knows about. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
Thomas Wouters wrote:
Picking up a rather old discussion... We encountered this bug at Google and I'm now "incentivized" to fix it.
For a short recap: Python has an import lock that prevents more than one thread from doing an import at any given time. However, unlike most of the locks we have lying around, we don't clear that lock in the child after an os.fork(). That means that doing an os.fork() during an import means the child process can't do any other imports. It also means that doing an os.fork() *while another thread is doing an import* means the child process can't do any other imports.
Since this three-year-old discussion we've added a couple of post-fork-cleanups to CPython (the TLS, the threading module's idea of active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do simply discard the memory for other locks held during fork (the GIL, see Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock, except when the platform is AIX. I don't see any particular reason why we aren't doing the same thing to the import lock that we do to the other locks, on all platforms. It's a quick fix for a real problem (see http://bugs.python.org/issue1590864 and http://bugs.python.org/issue1404925 for two bugreports that seem to be this very issue.)
+1. Leaving deadlock issues around that people can run into without doing anything wrong in their application is unfriendly.
It also seems to me, since we have two or three (depending on how you count) uses for it, that we should either add a way to free the old locks (to the various threading implementations) or add a way to use pthread_atexit portably -- or at least portably across systems with fork(). I don't think we should wait with fixing the big issue (making threads and fork() being unnecessarily flakey) until we have a good fix for the small issue (a tiny bit of a memory leak after a fork() -- in the child process only.) Especially since the good fix for the small issue might require co-ordination between all the threading implementations we have and nobody knows about.
One-off memory leaks for relatively rare operations like fork() don't bother me all that much, so I agree that having fork() leak another lock in the child process is preferable to it having a chance to mysteriously deadlock. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, May 18, 2006 at 3:02 PM, "Martin v. Löwis"<martin@v.loewis.de> wrote:
Right. With some googling, I found that one solution is pthread_atexit: a pthread_atexit handler is a triple (before, in_parent, in_child). You set it to (acquire, release, release).
Did you mean pthread_atfork() ?
When somebody forks, the pthread library will first acquire the import lock in the thread that wants to fork. Then the fork occurs, and the import lock gets then released both in the parent and in the child.
I would like to see this approach implemented, but I agree with you that a test case should be created first.
Whould some new C-API functions do the trick? PyOS_AtForkPrepare() # acquire import lock pid = fork() if (pid == 0) PyOS_AtForkChild() # current PyOS_AfterFork() + release import lock else PyOS_AtForkParent() # release import lock -- Lisandro Dalcín --------------- Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC) Instituto de Desarrollo Tecnológico para la Industria Química (INTEC) Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) PTLC - Güemes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594
On Thu, Jul 16, 2009 at 1:08 PM, Thomas Wouters <thomas@python.org> wrote:
Picking up a rather old discussion... We encountered this bug at Google and I'm now "incentivized" to fix it.
For a short recap: Python has an import lock that prevents more than one thread from doing an import at any given time. However, unlike most of the locks we have lying around, we don't clear that lock in the child after an os.fork(). That means that doing an os.fork() during an import means the child process can't do any other imports. It also means that doing an os.fork() *while another thread is doing an import* means the child process can't do any other imports.
Since this three-year-old discussion we've added a couple of post-fork-cleanups to CPython (the TLS, the threading module's idea of active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do simply discard the memory for other locks held during fork (the GIL, see Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock, except when the platform is AIX. I don't see any particular reason why we aren't doing the same thing to the import lock that we do to the other locks, on all platforms. It's a quick fix for a real problem (see http://bugs.python.org/issue1590864 and http://bugs.python.org/issue1404925 for two bugreports that seem to be this very issue.)
+1. We were also affected by this bug, getting sporatic deadlocks in a multi-threaded program that fork()s subprocesses to do processing. It took a while to figure out what was going on. -Mike
On Mon, Jul 20, 2009 at 11:26, Mike Klaas <mike.klaas@gmail.com> wrote:
On Thu, Jul 16, 2009 at 1:08 PM, Thomas Wouters <thomas@python.org> wrote:
Picking up a rather old discussion... We encountered this bug at Google and I'm now "incentivized" to fix it.
For a short recap: Python has an import lock that prevents more than one thread from doing an import at any given time. However, unlike most of the locks we have lying around, we don't clear that lock in the child after an os.fork(). That means that doing an os.fork() during an import means the child process can't do any other imports. It also means that doing an os.fork() *while another thread is doing an import* means the child process can't do any other imports.
Since this three-year-old discussion we've added a couple of post-fork-cleanups to CPython (the TLS, the threading module's idea of active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do simply discard the memory for other locks held during fork (the GIL, see Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock, except when the platform is AIX. I don't see any particular reason why we aren't doing the same thing to the import lock that we do to the other locks, on all platforms. It's a quick fix for a real problem (see http://bugs.python.org/issue1590864 and http://bugs.python.org/issue1404925 for two bugreports that seem to be this very issue.)
+1. We were also affected by this bug, getting sporatic deadlocks in a multi-threaded program that fork()s subprocesses to do processing. It took a while to figure out what was going on.
Actually, after careful consideration I have come to realize that simply resetting the lock is exactly the wrong thing to do. The import lock exists to prevent threads from getting a half-initialized module *while another module is creating the module*. That is to say, if threads 'A' and 'B' both import 'mod', it is not only important that threads A and B don't *both* try to execute mod.py, but also that thread B doesn't get a half-initialized module object that thread A is busy populating. if the import lock is held by the thread doing fork(), this is not a problem: the module is still being imported in the child process, and the fork doesn't affect it. If the import lock is held by another thread than the one doing fork(), any partially imported modules would remain in sys.modules in the child process, without ever getting finished. So we need to actually acquire the import lock before forking. We can do that in os.fork() and os.forkpty(), but we can't fix third-party extension modules; we'll have to introduce a new set of API functions, for getting and releasing the import lock. I would suggest we don't expose it as that, but instead call it a fork lock or such, so we can add extra lock acquire/release pairs as necessary. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
So attached (and at http://codereview.appspot.com/96125/show ) is a preliminary fix, correcting the problem with os.fork(), os.forkpty() and os.fork1(). This doesn't expose a general API for C code to use, for two reasons: it's not easy, and I need this fix more than I need the API change :-) (I actually need this fix myself for Python 2.4, but it applies fairly cleanly.) To fix the mutex-across-fork problem correctly we should really acquire three locks (the import lock, the GIL and the TLS lock, in that order.) The import lock is re-entrant, and the TLS lock is tightly confined to the thread-local-storage lookup functions, but the GIL is neither re-entrant nor inspectable. That means we can't use pthread_atfork (we can't tell whether we have the GIL already or not, right before the fork), nor can we provide a convenient API for extension modules to use, really. The best we can do is provide three functions, pthread_atfork-style: a 'prepare' function, an 'after-fork-in-child' function, and an 'after-fork-in-parent' function. The 'prepare' function would start by releasing the GIL, then acquire the import lock, the GIL and the TLS lock in that order. It would also record *somewhere* the thread_ident of the current thread. The 'in-parent' function would simply release the TLS lock and the import lock, and the 'in-child' would release those locks, call the threading._at_fork() function, and fix up the TLS data, using the recorded thread ident to do lookups. The 'in-child' function would replace the current PyOS_AfterFork() function (which blindly reinitializes the GIL and the TLS lock, and calls threading._at_fork().) Alternatively we could do what we do now, which is to ignore the fact that thread idents may change by fork() and thus that thread-local data may disappear, in which case the 'in-child' function could do a little less work. I'm suitably scared and disgusted of the TLS implementation, the threading implementations we support (including the pthread one) and the way we blindly type-pun a pthread_t to a long, that I'm not sure I want to try and build something correct and reliable on top of it. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org> wrote:
So attached (and at http://codereview.appspot.com/96125/show ) is a preliminary fix, correcting the problem with os.fork(), os.forkpty() and os.fork1(). This doesn't expose a general API for C code to use, for two reasons: it's not easy, and I need this fix more than I need the API change :-) (I actually need this fix myself for Python 2.4, but it applies fairly cleanly.)
This looks good to me. Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function. Related to the above subprocess fork + gc bug.. It'd be nice for CPython to have code that does the fork+misc twiddling+exec all in one C call without needing to execute Python code in the child process prior to the exec(). Most of the inner body of subprocess's _execute_child() method could be done that way. (with the obvious exception of the preexec_fn)
To fix the mutex-across-fork problem correctly we should really acquire three locks (the import lock, the GIL and the TLS lock, in that order.) The import lock is re-entrant, and the TLS lock is tightly confined to the thread-local-storage lookup functions, but the GIL is neither re-entrant nor inspectable. That means we can't use pthread_atfork (we can't tell whether we have the GIL already or not, right before the fork), nor can we provide a convenient API for extension modules to use, really. The best we can do is provide three functions, pthread_atfork-style: a 'prepare' function, an 'after-fork-in-child' function, and an 'after-fork-in-parent' function. The 'prepare' function would start by releasing the GIL, then acquire the import lock, the GIL and the TLS lock in that order. It would also record *somewhere* the thread_ident of the current thread. The 'in-parent' function would simply release the TLS lock and the import lock, and the 'in-child' would release those locks, call the threading._at_fork() function, and fix up the TLS data, using the recorded thread ident to do lookups. The 'in-child' function would replace the current PyOS_AfterFork() function (which blindly reinitializes the GIL and the TLS lock, and calls threading._at_fork().)
Alternatively we could do what we do now, which is to ignore the fact that thread idents may change by fork() and thus that thread-local data may disappear, in which case the 'in-child' function could do a little less work. I'm suitably scared and disgusted of the TLS implementation, the threading implementations we support (including the pthread one) and the way we blindly type-pun a pthread_t to a long, that I'm not sure I want to try and build something correct and reliable on top of it.
-- Thomas Wouters <thomas@python.org>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
On Sat, Jul 25, 2009 at 19:25, Gregory P. Smith <greg@krypto.org> wrote:
On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org> wrote:
So attached (and at http://codereview.appspot.com/96125/show ) is a preliminary fix, correcting the problem with os.fork(), os.forkpty() and os.fork1(). This doesn't expose a general API for C code to use, for two reasons: it's not easy, and I need this fix more than I need the API
change
:-) (I actually need this fix myself for Python 2.4, but it applies fairly cleanly.)
This looks good to me.
Anyone else want to take a look at this before I check it in? I updated the patch (in Rietveld) to contain some documentation about the hazards of mixing fork and threads, which is the best we can do at the moment, at least without seriously overhauling the threading APIs (which, granted, is not that bad an idea, considering the mess they're in.) I've now thoroughly tested the patch, and for most platforms it's strictly better. On AIX it *may* behave differently (possibly 'incorrectly' for specific cases) if something other than os.fork() calls the C fork() and calls PyOS_AfterFork(), since on AIX it used to nuke the thread lock. *I* think the new behaviour (not nuking the lock) is the correct thing to do, but since most places that release the import lock don't bother to check if the lock was even held, the old behaviour may have been succesfully masking the bug on AIX systems. Perhaps for the backport to 2.6 (which I think is in order, and also in accordance with the guidelines) I should leave the AIX workaround in? Anyone think it should not be removed from 3.x/2.7?
Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function.
Unfortunately it's rather hard to make those functions work correctly with the current API -- we can't provide functions you can just use as arguments to pthread_atfork because the global interpreter lock is not re-entrant and we have no way of testing whether the current thread holds the GIL. I also get the creepy-crawlies when I look at the various thread_* implementations and see the horribly unsafe things they do (and also, for instance, the PendingCall stuff in ceval.c :S) Unfortunately there's no good way to fix these things without breaking API compatibility, let alone ABI compatibility.
Related to the above subprocess fork + gc bug.. It'd be nice for CPython to have code that does the fork+misc twiddling+exec all in one C call without needing to execute Python code in the child process prior to the exec(). Most of the inner body of subprocess's _execute_child() method could be done that way. (with the obvious exception of the preexec_fn)
To fix the mutex-across-fork problem correctly we should really acquire three locks (the import lock, the GIL and the TLS lock, in that order.)
import lock is re-entrant, and the TLS lock is tightly confined to the thread-local-storage lookup functions, but the GIL is neither re-entrant nor inspectable. That means we can't use pthread_atfork (we can't tell whether we have the GIL already or not, right before the fork), nor can we
convenient API for extension modules to use, really. The best we can do is provide three functions, pthread_atfork-style: a 'prepare' function, an 'after-fork-in-child' function, and an 'after-fork-in-parent' function. The 'prepare' function would start by releasing the GIL, then acquire the import lock, the GIL and the TLS lock in that order. It would also record *somewhere* the thread_ident of the current thread. The 'in-parent' function would simply release the TLS lock and the import lock, and the 'in-child' would release those locks, call the threading._at_fork() function, and fix up the TLS data, using the recorded thread ident to do lookups. The 'in-child' function would replace the current PyOS_AfterFork() function (which blindly reinitializes the GIL and the TLS lock, and calls threading._at_fork().)
Alternatively we could do what we do now, which is to ignore the fact
thread idents may change by fork() and thus that thread-local data may disappear, in which case the 'in-child' function could do a little less work. I'm suitably scared and disgusted of the TLS implementation, the threading implementations we support (including the pthread one) and the way we blindly type-pun a pthread_t to a long, that I'm not sure I want to
The provide a that try
and build something correct and reliable on top of it.
-- Thomas Wouters <thomas@python.org>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
-- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Wed, Sep 9, 2009 at 9:07 AM, Thomas Wouters<thomas@python.org> wrote:
On Sat, Jul 25, 2009 at 19:25, Gregory P. Smith <greg@krypto.org> wrote:
On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org> wrote:
So attached (and at http://codereview.appspot.com/96125/show ) is a preliminary fix, correcting the problem with os.fork(), os.forkpty() and os.fork1(). This doesn't expose a general API for C code to use, for two reasons: it's not easy, and I need this fix more than I need the API change :-) (I actually need this fix myself for Python 2.4, but it applies fairly cleanly.)
This looks good to me.
Anyone else want to take a look at this before I check it in? I updated the patch (in Rietveld) to contain some documentation about the hazards of mixing fork and threads, which is the best we can do at the moment, at least without seriously overhauling the threading APIs (which, granted, is not that bad an idea, considering the mess they're in.) I've now thoroughly tested the patch, and for most platforms it's strictly better. On AIX it *may* behave differently (possibly 'incorrectly' for specific cases) if something other than os.fork() calls the C fork() and calls PyOS_AfterFork(), since on AIX it used to nuke the thread lock. *I* think the new behaviour (not nuking the lock) is the correct thing to do, but since most places that release the import lock don't bother to check if the lock was even held, the old behaviour may have been succesfully masking the bug on AIX systems. Perhaps for the backport to 2.6 (which I think is in order, and also in accordance with the guidelines) I should leave the AIX workaround in? Anyone think it should not be removed from 3.x/2.7?
Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function.
Unfortunately it's rather hard to make those functions work correctly with the current API -- we can't provide functions you can just use as arguments to pthread_atfork because the global interpreter lock is not re-entrant and we have no way of testing whether the current thread holds the GIL. I also get the creepy-crawlies when I look at the various thread_* implementations and see the horribly unsafe things they do (and also, for instance, the PendingCall stuff in ceval.c :S) Unfortunately there's no good way to fix these things without breaking API compatibility, let alone ABI compatibility.
Take a look at http://code.google.com/p/python-atfork/ which I created to address the general fork+threading with locks held causing deadlocks issue with many standard library modules. Currently it only patches the logging module but I intend to support others. I want to get an atfork mechanism into 2.7/3.2 along with every lock in the standard library making proper use of it. See also http://bugs.python.org/issue6721 I make no attempt to deal with C-level locks, only those acquired from python. It doesn't use pthread_atfork but instead models its behavior after that and wraps os.fork and os.forkpty so that they call the registered atfork methods as appropriate.
Related to the above subprocess fork + gc bug.. It'd be nice for CPython to have code that does the fork+misc twiddling+exec all in one C call without needing to execute Python code in the child process prior to the exec(). Most of the inner body of subprocess's _execute_child() method could be done that way. (with the obvious exception of the preexec_fn)
To fix the mutex-across-fork problem correctly we should really acquire three locks (the import lock, the GIL and the TLS lock, in that order.) The import lock is re-entrant, and the TLS lock is tightly confined to the thread-local-storage lookup functions, but the GIL is neither re-entrant nor inspectable. That means we can't use pthread_atfork (we can't tell whether we have the GIL already or not, right before the fork), nor can we provide a convenient API for extension modules to use, really. The best we can do is provide three functions, pthread_atfork-style: a 'prepare' function, an 'after-fork-in-child' function, and an 'after-fork-in-parent' function. The 'prepare' function would start by releasing the GIL, then acquire the import lock, the GIL and the TLS lock in that order. It would also record *somewhere* the thread_ident of the current thread. The 'in-parent' function would simply release the TLS lock and the import lock, and the 'in-child' would release those locks, call the threading._at_fork() function, and fix up the TLS data, using the recorded thread ident to do lookups. The 'in-child' function would replace the current PyOS_AfterFork() function (which blindly reinitializes the GIL and the TLS lock, and calls threading._at_fork().)
Alternatively we could do what we do now, which is to ignore the fact that thread idents may change by fork() and thus that thread-local data may disappear, in which case the 'in-child' function could do a little less work. I'm suitably scared and disgusted of the TLS implementation, the threading implementations we support (including the pthread one) and the way we blindly type-pun a pthread_t to a long, that I'm not sure I want to try and build something correct and reliable on top of it.
-- Thomas Wouters <thomas@python.org>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/greg%40krypto.org
-- Thomas Wouters <thomas@python.org>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Wed, Sep 9, 2009 at 20:19, Gregory P. Smith <greg@krypto.org> wrote:
On Wed, Sep 9, 2009 at 9:07 AM, Thomas Wouters<thomas@python.org> wrote:
On Sat, Jul 25, 2009 at 19:25, Gregory P. Smith <greg@krypto.org> wrote:
On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org>
So attached (and at http://codereview.appspot.com/96125/show ) is a preliminary fix, correcting the problem with os.fork(), os.forkpty()
and
os.fork1(). This doesn't expose a general API for C code to use, for two reasons: it's not easy, and I need this fix more than I need the API change :-) (I actually need this fix myself for Python 2.4, but it applies fairly cleanly.)
This looks good to me.
Anyone else want to take a look at this before I check it in? I updated
patch (in Rietveld) to contain some documentation about the hazards of mixing fork and threads, which is the best we can do at the moment, at least without seriously overhauling the threading APIs (which, granted, is not that bad an idea, considering the mess they're in.) I've now thoroughly tested the patch, and for most platforms it's strictly better. On AIX it *may* behave differently (possibly 'incorrectly' for specific cases) if something other than os.fork() calls the C fork() and calls PyOS_AfterFork(), since on AIX it used to nuke the thread lock. *I* think the new behaviour (not nuking the lock) is the correct thing to do, but since most places that release the import lock don't bother to check if
lock was even held, the old behaviour may have been succesfully masking
wrote: the the the
bug on AIX systems. Perhaps for the backport to 2.6 (which I think is in order, and also in accordance with the guidelines) I should leave the AIX workaround in? Anyone think it should not be removed from 3.x/2.7?
Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function.
Unfortunately it's rather hard to make those functions work correctly with the current API -- we can't provide functions you can just use as arguments to pthread_atfork because the global interpreter lock is not re-entrant and we have no way of testing whether the current thread holds the GIL. I also get the creepy-crawlies when I look at the various thread_* implementations and see the horribly unsafe things they do (and also, for instance, the PendingCall stuff in ceval.c :S) Unfortunately there's no good way to fix these things without breaking API compatibility, let alone ABI compatibility.
Take a look at http://code.google.com/p/python-atfork/ which I created to address the general fork+threading with locks held causing deadlocks issue with many standard library modules. Currently it only patches the logging module but I intend to support others. I want to get an atfork mechanism into 2.7/3.2 along with every lock in the standard library making proper use of it.
See also http://bugs.python.org/issue6721
I make no attempt to deal with C-level locks, only those acquired from python. It doesn't use pthread_atfork but instead models its behavior after that and wraps os.fork and os.forkpty so that they call the registered atfork methods as appropriate.
Well, sure, the *Python code* side of the problem is fixable, thanks to Python being so awesome ;-P I'm strictly thinking of applications embedding Python (or even extending it and calling into code that doesn't consider Python.) Your python-atfork project looks like a terribly good idea, but it won't fix the embedding/extending issues (nor is it intended to, right?) -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Wed, Sep 9, 2009 at 11:36 AM, Thomas Wouters<thomas@python.org> wrote:
On Wed, Sep 9, 2009 at 20:19, Gregory P. Smith <greg@krypto.org> wrote:
Take a look at http://code.google.com/p/python-atfork/ which I created to address the general fork+threading with locks held causing deadlocks issue with many standard library modules. Currently it only patches the logging module but I intend to support others. I want to get an atfork mechanism into 2.7/3.2 along with every lock in the standard library making proper use of it.
See also http://bugs.python.org/issue6721
I make no attempt to deal with C-level locks, only those acquired from python. It doesn't use pthread_atfork but instead models its behavior after that and wraps os.fork and os.forkpty so that they call the registered atfork methods as appropriate.
Well, sure, the *Python code* side of the problem is fixable, thanks to Python being so awesome ;-P I'm strictly thinking of applications embedding Python (or even extending it and calling into code that doesn't consider Python.) Your python-atfork project looks like a terribly good idea, but it won't fix the embedding/extending issues (nor is it intended to, right?)
yeah I decided to only fix what could obviously be fixed and was causing me pain at the moment. ;)
Thomas Wouters wrote:
Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function.
Unfortunately it's rather hard to make those functions work correctly with the current API -- we can't provide functions you can just use as arguments to pthread_atfork because the global interpreter lock is not re-entrant and we have no way of testing whether the current thread holds the GIL.
I thought "make sure I have the GIL, either by already having it or waiting for it if I don't yet have it" was the entire point of the PyGILState_Ensure() API? [1] Cheers, Nick. [1] http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpret... -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 25Jul2009 10:25, Gregory P. Smith <greg@krypto.org> wrote: | On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org> wrote: | > So attached (and at http://codereview.appspot.com/96125/show ) is a | > preliminary fix, correcting the problem with os.fork(), os.forkpty() and | > os.fork1(). This doesn't expose a general API for C code to use, for two | > reasons: it's not easy, and I need this fix more than I need the API change | > :-) (I actually need this fix myself for Python 2.4, but it applies fairly | > cleanly.) | | This looks good to me. [...] Where's this stand with respect to the upcoming Python 2.6.3? -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Life is pleasant. Death is peaceful. It's the transition that's troublesome. - Isaac Asimov
On Fri, Sep 11, 2009 at 08:06, Cameron Simpson <cs@zip.com.au> wrote:
On 25Jul2009 10:25, Gregory P. Smith <greg@krypto.org> wrote: | On Thu, Jul 23, 2009 at 4:28 PM, Thomas Wouters<thomas@python.org> wrote: | > So attached (and at http://codereview.appspot.com/96125/show ) is a | > preliminary fix, correcting the problem with os.fork(), os.forkpty() and | > os.fork1(). This doesn't expose a general API for C code to use, for two | > reasons: it's not easy, and I need this fix more than I need the API change | > :-) (I actually need this fix myself for Python 2.4, but it applies fairly | > cleanly.) | | This looks good to me. [...]
Where's this stand with respect to the upcoming Python 2.6.3?
Unless anyone speaks up, I'll submit the fix (without the change in semantics on AIX) to release26-maint early next week, so it would be in 2.6.3. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Wed, Sep 9, 2009 at 23:56, Nick Coghlan <ncoghlan@gmail.com> wrote:
Thomas Wouters wrote:
Your idea of making this an API called a 'fork lock' or something sounds good (though I think it needs a better name. PyBeforeFork & PyAfterFork?). The subprocess module, for example, disables garbage collection before forking and restores it afterwards to avoid http://bugs.python.org/issue1336. That type of thing could also be done in such a function.
Unfortunately it's rather hard to make those functions work correctly with the current API -- we can't provide functions you can just use as arguments to pthread_atfork because the global interpreter lock is not re-entrant and we have no way of testing whether the current thread holds the GIL.
I thought "make sure I have the GIL, either by already having it or waiting for it if I don't yet have it" was the entire point of the PyGILState_Ensure() API? [1]
Hm, yeah. For some reason I was certain it was inappropriate, back when I was trying to create a pthread_atfork-friendly set of functions. At the time I was also hip-deep in the awfulness of Python/thread*.c and its unsafe punning and unwarranted assumptions, so I may have overreacted. I added this as a feature-request issue ( http://bugs.python.org/issue6923 ) and will look at it some more. In the mean time, I fixed the biggest source of issues by checking in the change to make at least calls to fork() made by Python be safe, also backported to 2.6. -- Thomas Wouters <thomas@python.org> Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
participants (12)
-
"Martin v. Löwis"
-
Cameron Simpson
-
Gregory P. Smith
-
Guido van Rossum
-
Lisandro Dalcin
-
Mike Klaas
-
Neal Norwitz
-
Nick Coghlan
-
Ronald Oussoren
-
Rotem Yaari
-
Thomas Wouters
-
Yair Chuchem