Threading hooks and disable gc per thread
Hello, today I've spent several hours debugging a segfault in JCC [1]. JCC is a framework to wrap Java code for Python. It's most prominently used in PyLucene [2]. You can read more about my debugging in [3] With JCC every Python thread must be registered at the JVM through JCC. An unattached thread, that accesses a wrapped Java object, leads to errors and may even cause a segfault. Accessing also includes garbage collection. A code line like a = {} or "a b c".split() can segfault since the allocation of a dict or a bound method runs through _PyObject_GC_New(), which may trigger a cyclic garbage collection run. If the current thread isn't attached to the JVM but triggers a gc.collect() with some Java objects in a cycle, the interpreter crashes. It's quite complicated and hard to "fix" third party tools to attach all threads created in the third party library. The issue could be solved with a simple on_thread_start hook in the threading module. However there is more to it. In order to free memory threads must also be detached from the JVM, when a thread has ended. A second on_thread_stop hook isn't enough since the bound methods may also lead to a gc.collect() run after the thread is detached. I propose three changes to Python in order to fix the issue: on thread start hook -------------------- Similar to the atexit module, third party modules can register a callable with *args and **kwargs. The functions are called inside the newly created thread just before the target is called. The best place for the hook list is threading.Thread._bootstrap_inner() right before the try: self.run() except: block. Exceptions are ignored during the call but reported to the user at the end (same as atexit's atexit_callfunc()) on thread end hook ------------------ Same as on thread start hook but the callables are called inside the dying thread after self.run(). gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() -------------------------------------------------------------- Right now almost any code can trigger a gc.collect() run non-deterministicly. Some application like JCC want to control if gc.collect() is wanted on a thread level. This could be solved with a new flat in PyThreadState. PyThreadState->gc_enabled is enabled by default. When the flag is false, _PyObject_GC_Malloc() doesn't start a gc.collect() run for that thread. The collection is delayed until another thread or the main thread triggers it. The three functions should also have a C equivalent so C code can prevent gc in a thread. Thoughs? Christian [1] http://lucene.apache.org/pylucene/jcc/index.html [2] http://lucene.apache.org/pylucene/ [3] http://mail-archives.apache.org/mod_mbox/lucene-pylucene-dev/201105.mbox/bro...
On Wed, May 11, 2011 at 4:58 PM, Christian Heimes <lists@cheimes.de> wrote:
Hello,
today I've spent several hours debugging a segfault in JCC [1]. JCC is a framework to wrap Java code for Python. It's most prominently used in PyLucene [2]. You can read more about my debugging in [3]
With JCC every Python thread must be registered at the JVM through JCC. An unattached thread, that accesses a wrapped Java object, leads to errors and may even cause a segfault. Accessing also includes garbage collection. A code line like
a = {}
or "a b c".split()
can segfault since the allocation of a dict or a bound method runs through _PyObject_GC_New(), which may trigger a cyclic garbage collection run. If the current thread isn't attached to the JVM but triggers a gc.collect() with some Java objects in a cycle, the interpreter crashes. It's quite complicated and hard to "fix" third party tools to attach all threads created in the third party library.
The issue could be solved with a simple on_thread_start hook in the threading module. However there is more to it. In order to free memory threads must also be detached from the JVM, when a thread has ended. A second on_thread_stop hook isn't enough since the bound methods may also lead to a gc.collect() run after the thread is detached.
I propose three changes to Python in order to fix the issue:
on thread start hook --------------------
Similar to the atexit module, third party modules can register a callable with *args and **kwargs. The functions are called inside the newly created thread just before the target is called. The best place for the hook list is threading.Thread._bootstrap_inner() right before the try: self.run() except: block. Exceptions are ignored during the call but reported to the user at the end (same as atexit's atexit_callfunc())
on thread end hook ------------------
Same as on thread start hook but the callables are called inside the dying thread after self.run().
Makes sense to me. Something that needs clarifying: when the process dies (main python thread has exited and all remaining python threads are daemon threads) the on thread end hook will _not_ be called. +1 This is really two separate feature requests. The above thread hooks and the below gc hooks.
gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() --------------------------------------------------------------
Right now almost any code can trigger a gc.collect() run non-deterministicly. Some application like JCC want to control if gc.collect() is wanted on a thread level. This could be solved with a new flat in PyThreadState. PyThreadState->gc_enabled is enabled by default. When the flag is false, _PyObject_GC_Malloc() doesn't start a gc.collect() run for that thread. The collection is delayed until another thread or the main thread triggers it.
The three functions should also have a C equivalent so C code can prevent gc in a thread.
This also sounds useful since we are a long long way from concurrent gc. (and whenever we gain that, we'd need a way to control when it can or can't happen or to register the gc threads with the anything that needs to know about 'em, JCC, etc..) +1 -gps
Am 14.05.2011 21:21, schrieb Gregory P. Smith:
Makes sense to me.
Something that needs clarifying: when the process dies (main python thread has exited and all remaining python threads are daemon threads) the on thread end hook will _not_ be called.
Good catch! This gotcha should be mentioned in the docs. A daemon thread can end at any point in its life cycle. It's not an issue for my use case. For JCC the hook just frees some resources that are freed anyway when the process ends. Other use cases may need a more deterministic cleanup, but that's out of the scope for my proposal. Users can get around the issue with an atexit hook, though.
This also sounds useful since we are a long long way from concurrent gc. (and whenever we gain that, we'd need a way to control when it can or can't happen or to register the gc threads with the anything that needs to know about 'em, JCC, etc..)
I though of a concurrent GC, too. A dedicated GC thread could improve response time of a GUI or web application if we could separate the cyclic garbage detection into two steps. Even on a fast machine, a full GC sweep with millions of objects in gen2 can take a long time up to a second, in which the interpreter is locked. I assume that the scanning a million objects takes most of the time. If it would be possible to have a scan without the GIL held and then remove the objects in a second step with the GIL acquired, response time could increase. However that would require a major redesign of the traverse and visit slots. Back to my proposal. My initial proposal was missing one feature. It should be possible to alter the default setting for PyThreadState->gc_enabled, too. JCC could use the additional API to make sure, non attached threads don't run the GC. Example how JCC could use the feature: lucene.initVM() initializes the Java VM and attaches the current thread. This is usually done in the main thread before any other thread is started. The function would call PyThread_set_gc_enabled(0) to set the default value for new thread states and to prevent any new thread from starting a cyclic GC collect. lucene.getVM().attachCurrentThread() creates some thread local objects in a TLS and registers the current thread at the Java VM. This would run PyObject_GC_set_thread_enabled(1) to allow GC collect in the current thread. lucene.getVMEnv().detachCurrentThread() cleans up the TLS and unregisters the thread, so a PyObject_GC_set_thread_enabled(0) is required. The implementation is rather simple: - a new static int variable for the default setting and a new flag in the PyThreadState struct - check PyThreadState_Get()->gc_enabled in _PyObject_GC_Malloc() - four small functions to set and get the default and thread setting - three Python functions in the gc module to enable, disable and get the flag from the current PyThreadState - a function to get the global flag. I'm not sure if we should expose the global switch for Python code. The attached patch already has all C functionality. If I hear more +1, then I'll write two small PEPs for both feature requests. Christian
On Thu, May 12, 2011 at 9:58 AM, Christian Heimes <lists@cheimes.de> wrote:
on thread start hook --------------------
Similar to the atexit module, third party modules can register a callable with *args and **kwargs. The functions are called inside the newly created thread just before the target is called. The best place for the hook list is threading.Thread._bootstrap_inner() right before the try: self.run() except: block. Exceptions are ignored during the call but reported to the user at the end (same as atexit's atexit_callfunc())
on thread end hook ------------------
Same as on thread start hook but the callables are called inside the dying thread after self.run().
So the plan is to have threading.Thread support the hooks, while _thread.start_new_thread and creation of thread states at the C level (including via PyGILState_Ensure) will bypass them? That actually sounds reasonable to me (+0), but the PEP should at least discuss the rationale for the choice of level for the new feature. I also suggest storing the associated hook lists at the threading.Thread class object level rather than at the threading module level (supporting such modularity of state being a major advantage of only providing this feature at the higher level). The PEP should also go into detail as to why having these hooks in a custom Thread subclass isn't sufficient (e.g. needing to support threads created by third party libraries, but note that such a rationale has a problem due to the _thread.start_new_thread loophole). Composability through inheritance should also be discussed - the hook invocation should probably walk the MRO so it is easy to create Thread subclasses that include class specific hooks without inadvertently skipping the hooks installed on threading.Thread. The possibility of passing exception information to thread_end hooks (ala __exit__ methods) should be considered, along with the general relationship between the threading hooks and the context management protocol.
gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() --------------------------------------------------------------
Right now almost any code can trigger a gc.collect() run non-deterministicly. Some application like JCC want to control if gc.collect() is wanted on a thread level. This could be solved with a new flat in PyThreadState. PyThreadState->gc_enabled is enabled by default. When the flag is false, _PyObject_GC_Malloc() doesn't start a gc.collect() run for that thread. The collection is delayed until another thread or the main thread triggers it.
The three functions should also have a C equivalent so C code can prevent gc in a thread.
The default setting for this should go in the interpreter state object rather than in a static variable (subinterpreters can then inherit the state of their parent interpreter when they are first created). Otherwise sounds reasonable. (+0) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 15.05.2011 13:13, schrieb Nick Coghlan: (Sorry for the delay, I was swamped with work again)
So the plan is to have threading.Thread support the hooks, while _thread.start_new_thread and creation of thread states at the C level (including via PyGILState_Ensure) will bypass them?
That actually sounds reasonable to me (+0), but the PEP should at least discuss the rationale for the choice of level for the new feature. I also suggest storing the associated hook lists at the threading.Thread class object level rather than at the threading module level (supporting such modularity of state being a major advantage of only providing this feature at the higher level).
I've considered both places, too. _thread.start_new_thread() as well as PyGILState_Ensure() would require a considerable amount of C coding for a feature that won't affect performance in a noticeable way. This is my answer against an C implementation in _thread.start_new_thread(). It's far too much work for a feature that can be implemented in Python easily. An implementation in the pure Python threading module will work on PyPy, IronPython and Jython instantly. I consider any library, that bypasses the threading module, broken, too. PyGILState_Ensure() or PyThreadState_New() are a different beast. I concur, it would the best place for the hooks if I could think of a way to implement the on-thread-stop hook. I don't see a way to execute some code at the end of a thread without cooperation from the calling code.
The PEP should also go into detail as to why having these hooks in a custom Thread subclass isn't sufficient (e.g. needing to support threads created by third party libraries, but note that such a rationale has a problem due to the _thread.start_new_thread loophole).
Understood.
Composability through inheritance should also be discussed - the hook invocation should probably walk the MRO so it is easy to create Thread subclasses that include class specific hooks without inadvertently skipping the hooks installed on threading.Thread.
Good idea! Do you think, it's sufficient to have hook methods like class Thread: _start_hooks = [] def on_thread_starting(self): for hook, args, kwargs in self._start_hooks: hook(*args, **kwargs) ? Subclasses of threading.Thread can easily overwrite the hook method and call its parent's on_thread_starting().
The possibility of passing exception information to thread_end hooks (ala __exit__ methods) should be considered, along with the general relationship between the threading hooks and the context management protocol.
That's an interesting idea! I'll consider it.
gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() --------------------------------------------------------------
The default setting for this should go in the interpreter state object rather than in a static variable (subinterpreters can then inherit the state of their parent interpreter when they are first created).
Otherwise sounds reasonable. (+0)
A subinterpreter flag isn't enough. All subinterpreters share a common GC list. A gc.collect() inside a subinterpreter run affects the entire interpreter and not just the one subinterpreter. I've to think about the issue of subinterpreters ... If I understand the code correctly, gc.get_objects() punches a hole in the subinterpreter isolation. It returns all tracked objects of the current process -- from all subinterpreters. Is this a design issue? The fact isn't mentioned in http://docs.python.org/c-api/init.html#bugs-and-caveats. Christian
On Mon, May 23, 2011 at 11:40 PM, Christian Heimes <lists@cheimes.de> wrote:
Am 15.05.2011 13:13, schrieb Nick Coghlan:
Composability through inheritance should also be discussed - the hook invocation should probably walk the MRO so it is easy to create Thread subclasses that include class specific hooks without inadvertently skipping the hooks installed on threading.Thread.
Good idea!
Do you think, it's sufficient to have hook methods like
class Thread: _start_hooks = []
def on_thread_starting(self): for hook, args, kwargs in self._start_hooks: hook(*args, **kwargs)
? Subclasses of threading.Thread can easily overwrite the hook method and call its parent's on_thread_starting().
I was actually thinking of making life even easier for subclasses: class Thread: start_hooks = [] @classmethod def _on_thread_starting(cls): # Hooks in parent classes are called before hooks in child classes hook_sources = reversed(cls.__mro__) for hook_source in hook_sources: # Arguable design decision here: only look at Thread subclasses, not any mixins if not issubclass(hook_source, Thread): continue hooks = hook_src.__dict__.get("start_hooks", ()) for hook, args, kwargs in hooks: hook(*args, **kwargs) With the parent method explicitly walking the whole MRO in reverse, any subclass hooks will naturally be invoked after any parent hooks without any particular effort on the part of the subclass implementor - the just need to provide and populate a "start_hooks" attribute. The alternative would mean that overriding "_start_hooks" in a subclass would block ready access to the main hooks in Thread.
gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread() --------------------------------------------------------------
The default setting for this should go in the interpreter state object rather than in a static variable (subinterpreters can then inherit the state of their parent interpreter when they are first created).
Otherwise sounds reasonable. (+0)
A subinterpreter flag isn't enough. All subinterpreters share a common GC list. A gc.collect() inside a subinterpreter run affects the entire interpreter and not just the one subinterpreter. I've to think about the issue of subinterpreters ...
If I understand the code correctly, gc.get_objects() punches a hole in the subinterpreter isolation. It returns all tracked objects of the current process -- from all subinterpreters. Is this a design issue? The fact isn't mentioned in http://docs.python.org/c-api/init.html#bugs-and-caveats.
It's quite possible - there's a reason that heavy use of subinterpreters has a "this may fail in unexpected ways" rider attached. Still, this is the kind of thing a PEP will hopefully do a reasonable job of flushing out and resolving. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Am 12.05.2011 01:58, schrieb Christian Heimes:
Hello,
today I've spent several hours debugging a segfault in JCC [1]. JCC is a framework to wrap Java code for Python. It's most prominently used in PyLucene [2]. You can read more about my debugging in [3]
With JCC every Python thread must be registered at the JVM through JCC. An unattached thread, that accesses a wrapped Java object, leads to errors and may even cause a segfault. Accessing also includes garbage collection. A code line like
a = {}
or "a b c".split()
can segfault since the allocation of a dict or a bound method runs through _PyObject_GC_New(), which may trigger a cyclic garbage collection run. If the current thread isn't attached to the JVM but triggers a gc.collect() with some Java objects in a cycle, the interpreter crashes. It's quite complicated and hard to "fix" third party tools to attach all threads created in the third party library.
I have a somewhat similar problem and just noticed this thread. In our software, we have multiple threads, and we use a lot of COM objects. COM object also have the requirement that they must only be used in the same thread (in the same apartment, to be exact) that created them. This also applies to cleaning up with the garbage collector. Ok, when the com object is part of some Python structures that include reference cycles, then the cycle gc tries to clean up the ref cycle and cleans up the COM object. This can happen in ANY thread, and in some cases the program crashes or the thread hangs. Here is my idea to fix this from within Python: The COM objects, when created, keep the name of the currently executing thread. In the __del__ method, where the cleanup of the COM object happens by calling the COM .Release() method, a check is made if the current thread is the allowed one or not. If it is the wrong thread, the COM object is kept alive by appending it to some list. The list is stored in a global dictionary indexed by the thread name. The remaining goal is to clear the lists in the dict inside the valid thread - which is done on every creation of a COM object, on every destruction of a COM object, and in the CoUninitialize function that every thread using COM must call before it is ending. At least that's my plan. Maybe you can use a similar approach? Thomas
On Fri, May 27, 2011 at 6:04 AM, Thomas Heller <theller@ctypes.org> wrote:
Here is my idea to fix this from within Python: The COM objects, when created, keep the name of the currently executing thread. In the __del__ method, where the cleanup of the COM object happens by calling the COM .Release() method, a check is made if the current thread is the allowed one or not. If it is the wrong thread, the COM object is kept alive by appending it to some list. The list is stored in a global dictionary indexed by the thread name.
Of course, this means that multiple COM objects in the same cycle become uncollectable, which again argues for the __close__ idiom. (Just like __del__ except that it can be run more than once, and it if there are multiples in a cycle, they are run in arbitrary order instead of deferred.) Alternatively, you might get away with some wonky proxy objects as part of the COM wrapping. -jJ
participants (5)
-
Christian Heimes
-
Gregory P. Smith
-
Jim Jewett
-
Nick Coghlan
-
Thomas Heller