Extension modules, Threading, and the GIL

Thu Jan 2 16:19:48 EST 2003

Greg Chapman <glc at well.com> writes:

> It seems to me that threadstate could be made global, or at least
> thread local, within Python, thus freeing client code from ever
> having to explicitly create a threadstate.  For this to work, Python
> would have to have the equivalent of thread local storage on all
> supported platforms.

This is a k.o. criterion. Python currently supports 11 threading
libraries, and a process to remove support for some of them will need
several releases. There currently is no support for TLS, *except
through the thread state itself*.

> Looking over the thread-sig archives, Greg Stein suggested that TLS
> could be emulated on platforms which don't offer it natively using a
> Python dict and a lock (at any rate, it should be possible with some
> sort of synchronized data structure).

This points to another sore spot: that would require reliable thread
identification. Currently, thread identification is broken, as Python
assumes that an int is sufficient to encapsulate a thread id. There
are platforms where this assumption is invalid.

> With only one threadstate per thread, a thread could easily
> determine whether it has the GIL (the threadstate could have some
> sort of active flag which gets set when it obtains the GIL); this
> might solve David Abraham's problem (not sure).

That is, of course, expensive: Every lock/unlock call needs to find
the TLS as well.

> (It could also allow a thread to call AcquireThread multiple times
> without deadlock; since there would be only one threadstate per
> thread, that state could preserve a lockcount to handle recursive
> calls.)

This is actually what David Abrahams says his problem is: he wants a
recursive lock. Of course, for efficiency, it might be better to use
platform recursive locks where avaiable.

> Thinking further about this, for this to work cleanly I think Python
> would have to allow only one interpreter per process.

If you are going for TLS, this is not strictly necessary: every
interpreter could maintain its own TLS key. Of course, in cases where
you want to acquire a thread, this would not be helpful, as you then
often don't have an interpreter, either, so you could not find out
what the TLS key is.

> I never use multiple interpreters, so I'm not quite sure what
> they're used for, 

People think they can use them to have several independent execution
environments. This is not true, though, as extension modules don't
keep a per-interpreter state (bug plain global variables); several
other global tables exist.

I believe multiple interpreters where added to silence the recurring
request to have multiple interpreters, and either not knowing or
deliberately ignoring that people would not get what they think they
would get.

> but I wonder if the need for them could be eliminated by providing a
> new built-in type (sort of like RExec without the security overhead)
> which would initialize itself by doing the stuff that
> Py_NewInterpreter does to get a new copy of the global data space
> and which would provide methods for executing code in that copy of
> the data space.

This would share the quality of RExec, though: it sort-of works, but
if you dig long enough, you'll peek holes into it easily. Unlike
RExec, those holes can't be mended in the Python core itself - you'll
have to patch loads of third-party extension modules itself. Of
course, it wouldn't be worse than Py_NewInterpreter, except that it
would have to make claims that it couldn't fulfill.

Regards,
Martin