[Python-ideas] Threading hooks and disable gc per thread

Gregory P. Smith greg at krypto.org
Sat May 14 21:21:09 CEST 2011


On Wed, May 11, 2011 at 4:58 PM, Christian Heimes <lists at cheimes.de> wrote:
> Hello,
>
> today I've spent several hours debugging a segfault in JCC [1]. JCC is a
> framework to wrap Java code for Python. It's most prominently used in
> PyLucene [2]. You can read more about my debugging in [3]
>
> With JCC every Python thread must be registered at the JVM through JCC.
> An unattached thread, that accesses a wrapped Java object, leads to
> errors and may even cause a segfault. Accessing also includes garbage
> collection. A code line like
>
>   a = {}
>
> or
>   "a b c".split()
>
> can segfault since the allocation of a dict or a bound method runs
> through _PyObject_GC_New(), which may trigger a cyclic garbage
> collection run. If the current thread isn't attached to the JVM but
> triggers a gc.collect() with some Java objects in a cycle, the
> interpreter crashes. It's quite complicated and hard to "fix" third
> party tools to attach all threads created in the third party library.
>
> The issue could be solved with a simple on_thread_start hook in the
> threading module. However there is more to it. In order to free memory
> threads must also be detached from the JVM, when a thread has ended. A
> second on_thread_stop hook isn't enough since the bound methods may also
> lead to a gc.collect() run after the thread is detached.
>
> I propose three changes to Python in order to fix the issue:
>
> on thread start hook
> --------------------
>
> Similar to the atexit module, third party modules can register a
> callable with *args and **kwargs. The functions are called inside the
> newly created thread just before the target is called. The best place
> for the hook list is threading.Thread._bootstrap_inner() right before
> the try: self.run() except: block. Exceptions are ignored during the
> call but reported to the user at the end (same as atexit's
> atexit_callfunc())
>
>
> on thread end hook
> ------------------
>
> Same as on thread start hook but the callables are called inside the
> dying thread after self.run().
>

Makes sense to me.

Something that needs clarifying: when the process dies (main python
thread has exited and all remaining python threads are daemon threads)
the on thread end hook will _not_ be called.

+1

This is really two separate feature requests.  The above thread hooks
and the below gc hooks.

> gc.disable_thread(), gc.enable_thread(), gc.isenabled_thread()
> --------------------------------------------------------------
>
> Right now almost any code can trigger a gc.collect() run
> non-deterministicly. Some application like JCC want to control if
> gc.collect() is wanted on a thread level. This could be solved with a
> new flat in PyThreadState. PyThreadState->gc_enabled is enabled by
> default. When the flag is false, _PyObject_GC_Malloc() doesn't start a
> gc.collect() run for that thread. The collection is delayed until
> another thread or the main thread triggers it.
>
> The three functions should also have a C equivalent so C code can
> prevent gc in a thread.

This also sounds useful since we are a long long way from concurrent
gc.  (and whenever we gain that, we'd need a way to control when it
can or can't happen or to register the gc threads with the anything
that needs to know about 'em, JCC, etc..)

+1

-gps



More information about the Python-ideas mailing list