[Python-Dev] Extension modules, Threading, and the GIL

Mark Hammond mhammond@skippinet.com.au
Wed, 8 Jan 2003 23:50:38 +1100


David:
> MarkH:
> > I fear the only way to approach this is with a PEP.  We
> > need to clearly state our requirements, and clearly show

> I'm also willing to lend a hand with a PEP, if it's worth anything.  I
> don't know as much about the problems in this domain as you do; I've
> only seen this one example that bit me.  I'm prepared to spend a few
> brain cycles on it and help with the writing, though.

Cool :)  And thanks to Anthony too.  I will keep python-dev CC'd for just a
little longer though, just to see what is controversial.  Something tells me
it will be the "goal" <wink>, so let's see how we go there.

My goal:
For a multi-threaded application (generally this will be a larger app
embedding Python, but that is irrelevant), make it reasonably easy to
accomplish 2 things:

1) Allow "arbitrary" threads (that is, threads never before seen by Python)
to acquire the resources necessary to call the Python C API.

2) Allow Python extensions to be written which support (1) above.

Currently (2) is covered by Py_BEGIN_ALLOW_THREADS, except that it is kinda
like only having a hammer in your toolbox <wink>.  I assert that 2) could
actually be split into discrete goals:

2.1) Extension functions that expect to take a lot of time, but generally
have no thread-state considerations.  This includes sleep(), all IO
functions, and many others.  This is exactly what Py_BEGIN_ALLOW_THREADS was
designed for.

2.2) Extensions that *may* take a little time, but more to the point, may
directly and synchronously trigger callbacks.  That is, it is not expected
that much time will be spent outside of Python, but rather that Python will
be re-entered.  I can concede that functions that may trigger asynch
callbacks need no special handling here, as the normal Python thread switch
mechanism will ensure correct their dispatch.

Currently 2.1 and 2.2 are handled the same way, but this need not be the
case.  Currently 2.2 is only supported by *always* giving up the lock, and
at each entry point *always* re-acquiring it.  This is obviously wasteful if
indeed the same thread immediately re-enters - hence we are here with a
request for "how do I tell if I have the lock?".  Combine this with the
easily stated but tricky to implement (1) and no one understands it at all
<frown>

I also propose that we restrict this to applications that intend to use a
single "PyInterpreterState" - if you truly want multiple threads running in
multiple interpreters (and good luck to you - I'm not aware anyone has ever
actually done it <wink>) then you are on your own.

Are these goals a reasonable starting point?  This describes all my
venturing into this area.

If-only-it-got-easier-each-time <wink> ly,

Mark.