Extension modules, Threading, and the GIL

Bengt Richter bokr at oz.net
Tue Dec 31 12:11:17 EST 2002


On Mon, 30 Dec 2002 11:40:58 -0500, David Abrahams <dave at boost-consulting.com> wrote:

>
>David Abrahams <dave at boost-consulting.com> writes:
>
>> Aahz <aahz at pythoncraft.com> writes:
>>> I think this thread might be better handled on c.l.py, at least
>>> until it's understood well enough to be clear whether something does
>>> need to change in Python.
>>
>> I'm pretty certain I understand what's going on.  If it would be
>> better for you to take this to c.l.py, I'm happy to do so, but AFAICT
>> there _is_ a Python core issue here: there's no way to find out
>> whether you've already got the GIL**, so if you _might_ have to acquire
>> it, you must always acquire it.
>>
>> **If I'm wrong about that, please let me know.  It isn't obvious from
>>   the documentation.
>
>This is the continuation of a thread originated on python-dev; I was
>asked to re-raise it here on python-list until the issue is better
>understood.
>
>The original posting was here:
>
>  http://aspn.activestate.com/ASPN/Mail/Message/python-dev/1482879
>
>The he essence of the problem is that there's no way to write code
>that uses the Python 'C' API and which has no knowledge of whether it
>is running on Python's main thread when it is entered.
Perhaps this discussion needs to distinguish clearly between OS
threads and Python threads? E.g., if you start out in C main, there
is no Python thread, and you have to do what python22.exe or Linux
equivalent does from it's C main entry point.

E.g., what if the C code spawns some threads that call Q through a
C api and block in some non-Python event loop, and then the main ( ;-)
thread starts an interpreter instance and pyrun-simple-strings to
start a python app? In that case, there will (presumably?) be a GIL
and a first thread state automatically established. Now there are
several threads with different machine stacks but sharing process space,
and running asynchronously (BTW, what if the first thread had spawned
a child in C main and let that init the Python interpreter, and used
the main thread to run the Q business? Signals would work differently,
for one thing, IWT.)

Either way, as machine threads running C code, IWT that they could synchronize
and exchange data in various ways, totally unbeknownst (ok, maybe data would
magically come from some class method, but it should be transparent whether it
came from the another thread or was synthesized local test data) to the Python
interpreter, so long as the interpreter per se was not used reentrantly by
different threads.

Now what if the Python main thread calls a Q C api and leaves some call-back
references in Q data space that a non-python thread can see? What use can it
make of it? In order to use the interpreter, it would have to have interpreter
state, which so far only the main thread has in this example. Unless the
act of acquiring the GIL when there is no existing interpreter state automatically
creates one for the Pythonless thread, I don't see a way within the same process.
But, if the right magic happens, a new python thread has effectivly been started
by transformation of a secondary machine thread. But what does the call-back
information passed via a back door mean in this python thread context? What kinds
of data might be invalid, if any? It would depend on how thread state is preserved,
and what the rules are. I guess it could be pretty much ok.

So when the non-Python thread was finished with its acquired thread state, would
that be entirely disposed of? What if references into that state were passed to
the main thread via the callback? I imagine you have to consider whether to keep
the new python thread state from then on, and re-use it if the same secondary thread
reacquires the GIL.

But bottom line, if the GIL is given up, the main thread ought just to unblock and
continue, without "realizing" that it had made use of the GIL. I.e., with no explicit
code to give it up.

How a secondary thread could get away with not acquiring its own python thread state
before calling the interpreter I don't know.

Please don't take any of the above as specific Python implementation info. I am mostly
speculating based on hunches, but I thought even if wrong, some relevant concepts would
be set out and might help someone rearrange them to state clearly what really is happening
and/or what Dave's concerns are ;-)

>
>The two respondents were left with some questions; you can read those,
>and my responses, in the thread at the bottom of the page referenced
>above.
>
I read the stuff at the URL above (not saying I totally digested it)
and got the impression that Dave is saying a "callback" may be called
**without** the running and calling thread having acquired the GIL.

This sounds to me like ignoring a busy flag and calling a non-reentrant
subroutine as if its internal state was somehow going to be ok.

Isn't that essentially what the GIL is for? To guarantee that non-reentrant
python-thread-specific interpreter state has been safely encapsulated in
python-thread-specific storage, so another thread can swap in a reference to
its analogous python-thread-specific state and thus continue with its last
thread-specific state (combined also with shared interpreter state)?

ISTM ignoring the GIL is from another thread in the same process would be
guaranteed trouble at some point, though with some luck re blocking and
and leftover memory state, maybe something could seductively *seem* to work.

Anyway, hope this helps stir things up ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list