Extension modules, Threading, and the GIL
Recently an issue has come up on the C++-sig which I think merits a little attention here. To boil it down, the situation looks like this: Shared library Q uses threading but not Python. It supplies a an interface by which users can supply callback functions. Some of these callbacks will be invoked directly in response to external calls into Q; others will be invoked on threads started by calls into Q. Python extension module A calls shared library Q, but doesn't use its callback interface. It works fine by itself. Python extension module B calls shared library Q and uses Q's callback interface. Because some of the callbacks need to use the Python API, and *might* be invoked by threads, they must all acquire the GIL. Because they also might be invoked by direct calls into Q, B must always release the GIL before calling anything in Q. Problem: using B while A is loaded breaks A: because B has installed callbacks in Q that acquire the GIL, A must also release the GIL before calling into Q. Notice that the author of A may have had no reason to believe anyone would install Python callbacks in Q! It seems to me that for situations like these, where a function may or may not be called on Python's main thread, it would be helpful if Python supplied a "recursive mutex" GIL acquisition/release pair, for which acquisition and release on the main thread would simply bump a counter. Is this something that was considered and rejected? TIA, Dave -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
Python extension module B calls shared library Q and uses Q's callback interface. Because some of the callbacks need to use the Python API, and *might* be invoked by threads, they must all acquire the GIL.
Wrong. If the code in B that calls Q does not allow threads, the callbacks don't need to reacquire the GIL.
Problem: using B while A is loaded breaks A: because B has installed callbacks in Q that acquire the GIL, A must also release the GIL before calling into Q.
Can you please explain what a callback is? Can the callbacks occur in the context of a different thread, i.e. different from the one that has installed the callback? If it is a true callback (i.e. Q will call B back while being called from B), then this won't interfere at all with A. Regards, Martin
"Martin v. Löwis" <martin@v.loewis.de> writes:
David Abrahams wrote:
Python extension module B calls shared library Q and uses Q's callback interface. Because some of the callbacks need to use the Python API, and *might* be invoked by threads, they must all acquire the GIL.
Wrong. If the code in B that calls Q does not allow threads, the callbacks don't need to reacquire the GIL.
I think you must be misunderstanding me. These callbacks might be invoked on threads that are not the Python main thread. http://www.python.org/doc/current/api/threads.html says: "the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions" I am taking that statement at face value and assuming it means what it says. Note that these threads were not started by Python's thread API.
Problem: using B while A is loaded breaks A: because B has installed callbacks in Q that acquire the GIL, A must also release the GIL before calling into Q.
Can you please explain what a callback is?
I'm trying to leave out irrelevant details, but a callback happens to be a virtual function in a C++ class instance in this case. These callbacks implement behaviors of base classes supplied by the library, Qt. For the purposes of this discussion, it might just as well be a 'C' language pointer-to-function, though: void (*)()
Can the callbacks occur in the context of a different thread, i.e. different from the one that has installed the callback?
I think the answer is yes, but I'm trying to leave out the irrelevant, and I'm pretty sure that it doesn't matter one whit which thread installs the callback. What matters, AFAICT, is that the callback might be invoked on a thread that's not Python's main thread, thus must acquire the GIL, so if Python's main thread wants to call something in Q that might invoke one of these callbacks, it must release the GIL.
If it is a true callback (i.e. Q will call B back while being called from B), then this won't interfere at all with A.
Maybe we have different ideas of what "callback" means. In your terms, it is not a "true callback". -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
Wrong. If the code in B that calls Q does not allow threads, the callbacks don't need to reacquire the GIL.
I think you must be misunderstanding me. These callbacks might be invoked on threads that are not the Python main thread.
Which thread is the main thread is not relevant at all. What matters is whether the callback is invoked in the context of the thread that installed it (i.e. as a result of calling a function in B).
"the rule exists that only the thread that has acquired the global interpreter lock may operate on Python objects or call Python/C API functions"
I am taking that statement at face value and assuming it means what it says.
And rightly so. Notice that it does not use the term "main thread".
Can you please explain what a callback is?
I'm trying to leave out irrelevant details, but a callback happens to be a virtual function in a C++ class instance in this case. These callbacks implement behaviors of base classes supplied by the library, Qt.
Ok, now I know how a callback is installed (by redefining a virtual method). The other question is: How is it invoked? I.e. who invokes it, and why? I suppose the immediate answer is "the library Q". However, that library does not invoke it with out a trigger: What is that trigger?
What matters, AFAICT, is that the callback might be invoked on a thread that's not Python's main thread, thus must acquire the GIL, so if Python's main thread wants to call something in Q that might invoke one of these callbacks, it must release the GIL.
Which thread is the main thread is completely irrelevant. If you have something like class MyB(B.Base): def overridden(self): print "This overrides a virtual function" def doit(): b = MyB() B.do_the_real_work(b) # will call Q, which will call the callback, # which will call overridden then there is no need for the C++ code in B to release the GIL, nor is there a need for B to reacquire the GIL in the callback. This is independent of whether doit is called in the main thread, or in the context of any other thread: at the point where do_the_real_work is called, the current thread holds the GIL. If it keeps the GIL, then the GIL will be held at the point when the callback occurs, in the context of the thread where the callback occurs.
Maybe we have different ideas of what "callback" means. In your terms, it is not a "true callback".
Then I'm still curious as to what triggers invocation of that virtual function. Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
Wrong. If the code in B that calls Q does not allow threads, the callbacks don't need to reacquire the GIL.
I think you must be misunderstanding me. These callbacks might be invoked on threads that are not the Python main thread.
Which thread is the main thread is not relevant at all.
Sorry, I got "main" confused with "current". My point is that the callback may be invoked by threads that don't hold the GIL.
What matters is whether the callback is invoked in the context of the thread that installed it (i.e. as a result of calling a function in B).
I still don't see how the thread that installed it has any bearing. Imagine there's a global function pointer variable somewhere. Some thread comes along and makes it point to some function f ("installs the callback"). Now there are various ways that callback can be called. Some other thread may pick up the global variable at any time and invoke the function it points to. Why does it matter which thread set the variable?
Can you please explain what a callback is?
I'm trying to leave out irrelevant details, but a callback happens to be a virtual function in a C++ class instance in this case. These callbacks implement behaviors of base classes supplied by the library, Qt.
Ok, now I know how a callback is installed (by redefining a virtual method).
Technically, redefining a virtual function by itself doesn't do anything. You have to make an instance of the class which redefines that function available to the library somehow. But you knew that.
The other question is: How is it invoked? I.e. who invokes it, and why?
I suppose the immediate answer is "the library Q". However, that library does not invoke it with out a trigger: What is that trigger?
There are several ways, IIUC. It may be invoked in response to direct calls into Q's API (which may be made from a Python extension module). It may also be invoked by some thread that Q has launched.
What matters, AFAICT, is that the callback might be invoked on a thread that's not Python's main thread, thus must acquire the GIL, so if Python's main thread wants to call something in Q that might invoke one of these callbacks, it must release the GIL.
Which thread is the main thread is completely irrelevant. If you have something like
class MyB(B.Base): def overridden(self): print "This overrides a virtual function"
def doit(): b = MyB() B.do_the_real_work(b) # will call Q, which will call the callback, # which will call overridden
then there is no need for the C++ code in B to release the GIL, nor is there a need for B to reacquire the GIL in the callback. This is independent of whether doit is called in the main thread, or in the context of any other thread: at the point where do_the_real_work is called, the current thread holds the GIL.
Yes, we understand that problem. I had this exact discussion with the designer of B. He explained that the problem is that Q might also invoke the virtual function on a thread that is not holding the GIL.
If it keeps the GIL, then the GIL will be held at the point when the callback occurs, in the context of the thread where the callback occurs.
Yes.
Maybe we have different ideas of what "callback" means. In your terms, it is not a "true callback".
Then I'm still curious as to what triggers invocation of that virtual function.
Q, either directly via its API, or in some thread it started. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
What matters is whether the callback is invoked in the context of the thread that installed it (i.e. as a result of calling a function in B).
I still don't see how the thread that installed it has any bearing.
If the callbacks would only occur while Q is being called from B, and in the context of the thread which also hosts that call, then B should not release the lock. Then the callback would not need to reacquire it.
There are several ways, IIUC. It may be invoked in response to direct calls into Q's API (which may be made from a Python extension module). It may also be invoked by some thread that Q has launched.
I find this surprising. Is this *any* API of Q that can trigger the callbacks, or a dispatch_one_event call that can do so?
Yes, we understand that problem. I had this exact discussion with the designer of B. He explained that the problem is that Q might also invoke the virtual function on a thread that is not holding the GIL.
I would more deeply question such a statement. It sounds like this library uses the callback for event processing of some kind, in which case the callbacks are only invoked if events are processed. It should be possible to describe more precisely under what specific conditions event processing occurs.
Then I'm still curious as to what triggers invocation of that virtual function.
Q, either directly via its API, or in some thread it started.
Then calling this API must release the GIL. Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
What matters is whether the callback is invoked in the context of the thread that installed it (i.e. as a result of calling a function in B).
I still don't see how the thread that installed it has any bearing.
If the callbacks would only occur while Q is being called from B, and in the context of the thread which also hosts that call, then B should not release the lock. Then the callback would not need to reacquire it.
AFAICT, that still has nothing to do with which thread installs the callback. It has everything to do with which threads may invoke the callback, and whether they already hold the GIL.
There are several ways, IIUC. It may be invoked in response to direct calls into Q's API (which may be made from a Python extension module). It may also be invoked by some thread that Q has launched.
I find this surprising. Is this *any* API of Q that can trigger the callbacks, or a dispatch_one_event call that can do so?
I don't know; I'll have to defer to Phil for this answer.
Yes, we understand that problem. I had this exact discussion with the designer of B. He explained that the problem is that Q might also invoke the virtual function on a thread that is not holding the GIL.
I would more deeply question such a statement. It sounds like this library uses the callback for event processing of some kind, in which case the callbacks are only invoked if events are processed. It should be possible to describe more precisely under what specific conditions event processing occurs.
Probably. I can't do it, though.
Then I'm still curious as to what triggers invocation of that virtual function.
Q, either directly via its API, or in some thread it started.
Then calling this API must release the GIL.
Yes, that is clearly the case under present conditions. The point is that this is merely by virtue of the fact that some other extension module may install a GIL-acquiring callback in Q. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
On Sun, Dec 29, 2002, David Abrahams wrote:
Python extension module B calls shared library Q and uses Q's callback interface. Because some of the callbacks need to use the Python API, and *might* be invoked by threads, they must all acquire the GIL. Because they also might be invoked by direct calls into Q, B must always release the GIL before calling anything in Q.
So you're saying that the callback functions in B acquire the GIL?
Problem: using B while A is loaded breaks A: because B has installed callbacks in Q that acquire the GIL, A must also release the GIL before calling into Q.
Why? The callbacks in B will simply hang until they acquire the GIL. I think this thread might be better handled on c.l.py, at least until it's understood well enough to be clear whether something does need to change in Python. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "There are three kinds of lies: Lies, Damn Lies, and Statistics." --Disraeli
Aahz <aahz@pythoncraft.com> writes:
On Sun, Dec 29, 2002, David Abrahams wrote:
Python extension module B calls shared library Q and uses Q's callback interface. Because some of the callbacks need to use the Python API, and *might* be invoked by threads, they must all acquire the GIL. Because they also might be invoked by direct calls into Q, B must always release the GIL before calling anything in Q.
So you're saying that the callback functions in B acquire the GIL?
Yes.
Problem: using B while A is loaded breaks A: because B has installed callbacks in Q that acquire the GIL, A must also release the GIL before calling into Q.
Why? The callbacks in B will simply hang until they acquire the GIL.
If A doesn't release the GIL, one of its direct calls into Q from Python may invoke a callback in B, which tries to acquire the lock when it is already held. This is a no-no. I realize that the docs for PyEval_AcquireLock() say: "If this thread already has the lock, a deadlock ensues", but the behavior we're seeing is consistent with a scenario where trying to acquire an already-held is a no-op and releasing it is unconditional. Eventually the GIL release in B's callback takes effect and when A returns to Python there is no thread state.
I think this thread might be better handled on c.l.py, at least until it's understood well enough to be clear whether something does need to change in Python.
I'm pretty certain I understand what's going on. If it would be better for you to take this to c.l.py, I'm happy to do so, but AFAICT there _is_ a Python core issue here: there's no way to find out whether you've already got the GIL**, so if you _might_ have to acquire it, you must always acquire it. **If I'm wrong about that, please let me know. It isn't obvious from the documentation. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
[David Abrahams]
... but AFAICT there _is_ a Python core issue here: there's no way to find out whether you've already got the GIL**, so if you _might_ have to acquire it, you must always acquire it.
**If I'm wrong about that, please let me know. It isn't obvious from the documentation.
It's true -- you can't know whether you have the GIL, unless you code up another layer of your own machinery to keep track of who has the GIL. Mark Hammond faces this issue (in all its multifacted glories) in the Win32 extensions, and built some C++ classes there to help him out. It's difficult at best, and last I looked (several years ago) I wasn't convinced Mark's approach was 109% reliable. The worst it gets in the Python core is in posixmodule.c's _PyPclose, which needs to build a new interpreter state from scratch, in order to build a new thread state from scratch, in order to acquire the GIL, in order to call some Py API functions. This isn't as hard as it *can* get, because in that function we know the thread executing does not hold the GIL. The now-defunct Thread-SIG used to have bouts of angst over all this. I think we eventually figured out a better approach there, but it required real work to implement, and nobody had time for that. The only that's changed since then is that nobody remembers the better approach anymore <0.9 wink>. happy-new-year-ly y'rs - tim
[Tim]
[David Abrahams]
... but AFAICT there _is_ a Python core issue here: there's no way to find out whether you've already got the GIL**, so if you _might_ have to acquire it, you must always acquire it.
**If I'm wrong about that, please let me know. It isn't obvious from the documentation.
It's true -- you can't know whether you have the GIL, unless you code up another layer of your own machinery to keep track of who has the GIL. Mark Hammond faces this issue (in all its multifacted glories) in the Win32 extensions, and built some C++ classes there to help him out. It's difficult at best, and last I looked (several years ago) I wasn't convinced.
We do have a real problem here, and I keep stumbling across it. So far, this issue has hit me in the win32 extensions, in Mozilla's PyXPCOM, and even in Gordon's "installer". IMO, the reality is that the Python external thread-state API sucks. I can boldly make that assertion as I have heard many other luminaries say it before me. As Tim suggests, time is the issue. I fear the only way to approach this is with a PEP. We need to clearly state our requirements, and clearly show scenarios where interpreter states, thread states, the GIL etc all need to cooperate. Eg, InterpreterState's seem YAGNI, but manage to complicate using ThreadStates, which are certainly YNI. The ability to "unconditionally grab the lock" may be useful, as may a construct meaning "I'm calling out to/in from an external API" discrete from the current singular "release/acquire the GIL" construct available today. I'm willing to help out with this, but not take it on myself. I have a fair bit to gain - if I can avoid toggling locks every time I call out to each and every function there would be some nice perf gains to be had, and horrible code to remove. Once I clear the mail from my break I will try and find the thread-sig conclusions... Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
We do have a real problem here, and I keep stumbling across it. So far, this issue has hit me in the win32 extensions, in Mozilla's PyXPCOM, and even in Gordon's "installer". IMO, the reality is that the Python external thread-state API sucks. I can boldly make that assertion as I have heard many other luminaries say it before me. As Tim suggests, time is the issue.
I fear the only way to approach this is with a PEP. We need to clearly state our requirements, and clearly show scenarios where interpreter states, thread states, the GIL etc all need to cooperate. Eg, InterpreterState's seem YAGNI, but manage to complicate using ThreadStates, which are certainly YNI. The ability to "unconditionally grab the lock" may be useful, as may a construct meaning "I'm calling out to/in from an external API" discrete from the current singular "release/acquire the GIL" construct available today.
I'm willing to help out with this, but not take it on myself. I have a fair bit to gain - if I can avoid toggling locks every time I call out to each and every function there would be some nice perf gains to be had, and horrible code to remove.
I'm also willing to lend a hand with a PEP, if it's worth anything. I don't know as much about the problems in this domain as you do; I've only seen this one example that bit me. I'm prepared to spend a few brain cycles on it and help with the writing, though. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David:
MarkH:
I fear the only way to approach this is with a PEP. We need to clearly state our requirements, and clearly show
I'm also willing to lend a hand with a PEP, if it's worth anything. I don't know as much about the problems in this domain as you do; I've only seen this one example that bit me. I'm prepared to spend a few brain cycles on it and help with the writing, though.
Cool :) And thanks to Anthony too. I will keep python-dev CC'd for just a little longer though, just to see what is controversial. Something tells me it will be the "goal" <wink>, so let's see how we go there. My goal: For a multi-threaded application (generally this will be a larger app embedding Python, but that is irrelevant), make it reasonably easy to accomplish 2 things: 1) Allow "arbitrary" threads (that is, threads never before seen by Python) to acquire the resources necessary to call the Python C API. 2) Allow Python extensions to be written which support (1) above. Currently (2) is covered by Py_BEGIN_ALLOW_THREADS, except that it is kinda like only having a hammer in your toolbox <wink>. I assert that 2) could actually be split into discrete goals: 2.1) Extension functions that expect to take a lot of time, but generally have no thread-state considerations. This includes sleep(), all IO functions, and many others. This is exactly what Py_BEGIN_ALLOW_THREADS was designed for. 2.2) Extensions that *may* take a little time, but more to the point, may directly and synchronously trigger callbacks. That is, it is not expected that much time will be spent outside of Python, but rather that Python will be re-entered. I can concede that functions that may trigger asynch callbacks need no special handling here, as the normal Python thread switch mechanism will ensure correct their dispatch. Currently 2.1 and 2.2 are handled the same way, but this need not be the case. Currently 2.2 is only supported by *always* giving up the lock, and at each entry point *always* re-acquiring it. This is obviously wasteful if indeed the same thread immediately re-enters - hence we are here with a request for "how do I tell if I have the lock?". Combine this with the easily stated but tricky to implement (1) and no one understands it at all <frown> I also propose that we restrict this to applications that intend to use a single "PyInterpreterState" - if you truly want multiple threads running in multiple interpreters (and good luck to you - I'm not aware anyone has ever actually done it <wink>) then you are on your own. Are these goals a reasonable starting point? This describes all my venturing into this area. If-only-it-got-easier-each-time <wink> ly, Mark.
Mark Hammond wrote:
1) Allow "arbitrary" threads (that is, threads never before seen by Python) to acquire the resources necessary to call the Python C API.
This is possible today, all you need is a pointer to an interpreter state. If you have that, you can use PyThreadState_New, PyEval_AcquireThread, after which you have the resources necessary to call the Python API. In many cases, extensions can safely assume that there is exactly one interpreter state all the time, so they can safe the interpreter pointer in their init function. Regards, Martin
Mark Hammond wrote:
1) Allow "arbitrary" threads (that is, threads never before seen by Python) to acquire the resources necessary to call the Python C API.
This is possible today, all you need is a pointer to an interpreter state. If you have that, you can use PyThreadState_New,
But what if in some cases, this callback is as a result of Python code on the same thread - ie, there already exists a Python thread-state higher up the stack? Mark.
Mark Hammond wrote:
But what if in some cases, this callback is as a result of Python code on the same thread - ie, there already exists a Python thread-state higher up the stack?
Then you get a deadlock. However, it was not your (stated) goal to support this case. You mentioned threads that Python had never seen before - there can't be a thread state higher up in such a thread. Regards, Martin
Then you get a deadlock. However, it was not your (stated) goal to support this case. You mentioned threads that Python had never seen before - there can't be a thread state higher up in such a thread.
My mistake - I used "i.e." in place of "e.g.". However, "arbitrary" is fairly clear. Mark.
Mark Hammond wrote:
My mistake - I used "i.e." in place of "e.g.". However, "arbitrary" is fairly clear.
I feel this is still underspecified. I have successfully used multiple threads, and callbacks from arbitrary threads. For this to work, I have to allow threads in all calls to the library if the library can call back before returning. Regards, Martin
[Martin]
I feel this is still underspecified. I have successfully used multiple threads, and callbacks from arbitrary threads. For this to work, I have to allow threads in all calls to the library if the library can call back before returning.
It can be done, yes. I am not looking for a change in semantics, just a simple way to do it (and maybe even a fast way to do it, but that is secondary). If such a way already exists, please enlighten us. If not, but it is sufficiently simple to describe, then please describe it. Otherwise, I do not understand your point. Mark.
Mark Hammond wrote:
It can be done, yes. I am not looking for a change in semantics, just a simple way to do it (and maybe even a fast way to do it, but that is secondary). If such a way already exists, please enlighten us. If not, but it is sufficiently simple to describe, then please describe it. Otherwise, I do not understand your point.
There is a very simple strategy to support multiple threads in an extension module. 1. In all callbacks, create a thread state and acquire the current thread (this requires a singleton interpreter state). 2. In all API calls that may invoke callbacks, use Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS around the API call. If this strategy is followed, every code always has all Python resources, and no deadlocks result. Regards, Martin
"Martin v. Löwis" <martin@v.loewis.de> writes:
Mark Hammond wrote:
It can be done, yes. I am not looking for a change in semantics, just a simple way to do it (and maybe even a fast way to do it, but that is secondary). If such a way already exists, please enlighten us. If not, but it is sufficiently simple to describe, then please describe it. Otherwise, I do not understand your point.
There is a very simple strategy to support multiple threads in an extension module.
1. In all callbacks, create a thread state and acquire the current thread (this requires a singleton interpreter state).
2. In all API calls that may invoke callbacks, use Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS around the API call.
If this strategy is followed, every code always has all Python resources, and no deadlocks result.
IIUC, that strategy doesn't get Mark what he wants in this case: 2.2) Extensions that *may* take a little time, but more to the point, may directly and synchronously trigger callbacks. That is, it is not expected that much time will be spent outside of Python, but rather that Python will be re-entered. Which is to be able to avoid releasing the GIL in the case where the extension isn't going to do much other than invoke the callback function which re-acquires it. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
IIUC, that strategy doesn't get Mark what he wants in this case:
2.2) Extensions that *may* take a little time, but more to the point, may directly and synchronously trigger callbacks. That is, it is not expected that much time will be spent outside of Python, but rather that Python will be re-entered.
Which is to be able to avoid releasing the GIL in the case where the extension isn't going to do much other than invoke the callback function which re-acquires it.
I think you are incorrectly interpreting Mark's priorities: I am not looking for a change in semantics, just a simple way to do it (and maybe even a fast way to do it, but that is secondary). So performance is not the his primary goal. The goal is that it is easy to use, and I think my strategy is fairly easy to follow: If in doubt, release the lock. Regards, Martin
"Martin v. Löwis" <martin@v.loewis.de> writes:
David Abrahams wrote:
IIUC, that strategy doesn't get Mark what he wants in this case: 2.2) Extensions that *may* take a little time, but more to the point, may directly and synchronously trigger callbacks. That is, it is not expected that much time will be spent outside of Python, but rather that Python will be re-entered. Which is to be able to avoid releasing the GIL in the case where the extension isn't going to do much other than invoke the callback function which re-acquires it.
I think you are incorrectly interpreting Mark's priorities:
I am not looking for a change in semantics, just a simple way to do it (and maybe even a fast way to do it, but that is secondary).
So performance is not the his primary goal. The goal is that it is easy to use, and I think my strategy is fairly easy to follow: If in doubt, release the lock.
OK. I guess there's one more point worth mentioning: APIs are not always scrupulously documented. In particular, documentation might give you no reason to think any callbacks will be invoked for a given call, when in fact it will be. Furthermore, problems with not releasing the GIL will don't show up with arbitrary callbacks in the API, only when someone finally installs one which uses Python's API. The Windows API is a prime example of this, but I'm sure there are many others. If we could make "creating a thread state and acquiring the current thread" immune to the condition where the the current thread is already acquired, we'd be making it much easier to write bulletproof extensions. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
OK. I guess there's one more point worth mentioning: APIs are not always scrupulously documented. In particular, documentation might give you no reason to think any callbacks will be invoked for a given call, when in fact it will be. [...] The Windows API is a prime example of this
Are you sure about this? I would expect that the documentation of the Win32 API is very clear about when and how user code is invoked. More precisely, no API function except DispatchEvent will ever invoke user code. Maybe you meant "Windows API" in a more general sense? If you include COM, then yes, any invocation of a COM object may do many things, so you should always release the GIL when invoking a COM method. Regards, Martin
"Martin v. Löwis" <martin@v.loewis.de> writes:
David Abrahams wrote:
OK. I guess there's one more point worth mentioning: APIs are not always scrupulously documented. In particular, documentation might give you no reason to think any callbacks will be invoked for a given call, when in fact it will be. [...] The Windows API is a prime example of this
Are you sure about this? I would expect that the documentation of the Win32 API is very clear about when and how user code is invoked. More precisely, no API function except DispatchEvent will ever invoke user code.
Maybe you meant "Windows API" in a more general sense? If you include COM, then yes, any invocation of a COM object may do many things, so you should always release the GIL when invoking a COM method.
No, in fact there are several places where the API docs are less-than-scrupulous about letting you know that your own event dispatching hook may be re-entered during the call. It's been a long time since I've had the pleasure, but IIRC one of them happens in the APIs for printing. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
No, in fact there are several places where the API docs are less-than-scrupulous about letting you know that your own event dispatching hook may be re-entered during the call. It's been a long time since I've had the pleasure, but IIRC one of them happens in the APIs for printing.
It's unclear what you are talking about here. If you mean PrintDlgEx, then it very well documents that PRINTDLGEX.lpCallback can be invoked. In any case, it would be a bug in the wrapper to not release the GIL around calling PrintDlgEx. Bugs happen and they can be fixed. Regards, Martin
"Martin v. Löwis" <martin@v.loewis.de> writes:
David Abrahams wrote:
No, in fact there are several places where the API docs are less-than-scrupulous about letting you know that your own event dispatching hook may be re-entered during the call. It's been a long time since I've had the pleasure, but IIRC one of them happens in the APIs for printing.
It's unclear what you are talking about here. If you mean PrintDlgEx, then it very well documents that PRINTDLGEX.lpCallback can be invoked.
Well, as I said, it's been a long time, so I don't remember the details. However, let's assume it was PrintDlgEx for the time being. If the caller of PrintDlgEx controls the contents of the PRINTDLGEX structure, he can determine whether its lpCallback points to a function that calls back into Python. If it doesn't call back into Python, he might reasonably presume that there's no need to release the GIL. He would be wrong. Lots of events can be dispatched to the application before PrintDlgEx returns, so he needs to release the GIL if anything in the application event handler can invoke Python. AFAICT, this is typical for any Windows API function which the Windows engineers thought might take a long time to return, and it's typically not documented.
In any case, it would be a bug in the wrapper to not release the GIL around calling PrintDlgEx.
You say tomato, I say documentation bug.
Bugs happen and they can be fixed.
Yes. This is an example of a kind of bug which is not uncommon, and very hard to detect under some reasonable usage/development scenarios. It might make sense to make Python immune to this kind of bug. I think I'm done arguing about this. If Mark isn't discouraged by now, I'm still ready to help with the PEP. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
[David]
I think I'm done arguing about this. If Mark isn't discouraged by now, I'm still ready to help with the PEP.
No, I saw this coming a mile off :) A little like clockwork really. Martin seems to be trying to make 3 points: 1) There is no problem. All Windows API function that could indirectly send a Windows message are clearly documented that they generate messages, and if they arent then it is all MS' fault anyway. Substitute "Windows" and "MS" for the particular problem you are having, and we have a nice answer to every potential problem :) 2) Even if such a problem did exist, then creating a brand new thread-state for each and every invocation is acceptable. 3) Mark stated performance was secondary to correctness. Therefore, as soon as we have correctness we can ignore performance as it is not a primary requirement. (1) is clearly bogus. As with David, I am not interested in discussing this issue. David, Anthony and I all have this problem today. Tim Peters can see the problem and can see it exists (even if he believes my current implementation is incorrect). All due respect Martin, but stick to where your expertise lies. As for (2): My understanding (due to the name of the object) is that a single thread should use a single thread-state. You are suggesting that the same thread could have any number of different thread-states, depending on how often the Python interpreter was recursively entered. While I concede that this is likely to work in the general case, I am not sure it is "correct". If no threading semantics will be broken by having one thread use multiple thread-states, then I must ask what purpose thread-states (as opposed to the GIL) have. Mark.
Mark Hammond wrote:
While I concede that this is likely to work in the general case, I am not sure it is "correct". If no threading semantics will be broken by having one thread use multiple thread-states, then I must ask what purpose thread-states (as opposed to the GIL) have.
That is easy to answer (even though it is out of my area of expertise): it carries the Python stack, in particular for exceptions. Now, if you have multiple thread states in a single thread, the question is how a Python exception should propagate through the C stack. With multiple thread states, the exception "drops off" in the callback, which usually has no meaningful way to deal with it except to print it (in my application, the callback was always CORBA-initiated, so it was straight-forward to propagate it across the wire to the remote caller). The only meaningful alternative would be to assume that there is a single thread state. In that case, the exception would be stored in the thread state, and come out in the original caller. Now, it is very questionable that you could unwind the C stack between the the entrance to the library and the callback: If, as David says, you don't even know that the API may invoke a callback, there is surely no way to indicate that an exception came out of it. As a result, when returning to the bottom of the C stack, the extension suddenly finds an extension in its thread state. The extension probably doesn't expect that exception, so it is simply lost (when the next exception is set). Potentially, strange things happen as somebody might invoke PyErr_Occurred(). I question whether this is better than printing the exception, in the case of multiple thread states. Regards, Martin
[Martin]
Mark Hammond wrote:
While I concede that this is likely to work in the general case, I am not sure it is "correct". If no threading semantics will be broken by having one thread use multiple thread-states, then I must ask what purpose thread-states (as opposed to the GIL) have.
That is easy to answer (even though it is out of my area of expertise): it carries the Python stack, in particular for exceptions.
It also carries the profiler and debugger hooks, a general purpose "thread state" dictionary and other misc. details such as the tick count, recursion depth protection etc.
Now, if you have multiple thread states in a single thread, the question is how a Python exception should propagate through the C stack.
Actually, I think the question is still "why would a single thread have multiple thread-states?". (Or maybe "should a thread-state be renamed to, say, an "invocation frame"?)
With multiple thread states, the exception "drops off" in the callback, which usually
"usually" is the key word here. Python isn't designed only to handle what programs "usually" do. A strategy I have seen recently here, which is to argue that any other requirements are self-evidently broken, is not helpful. We could possibly argue that exceptions are OK to handle this way. Similar amounts of text could also possibly convince that the profiler, debugger and thread-switch items also will not be too badly broken by having multiple thread states per thread, or that such breakage is "desirable" (ie, can be justified). You will have more trouble convincing me that future items stored in a Python thread state will not be broken, but I am past arguing about it. Please correct me if I am wrong, but it seems your solution to this is: * Every function which *may* trigger such callbacks *must* switch out the current thread state (thereby dropping the GIL etc) * Every entry-point which needs to call Python must *always* allocate and switch to a new thread-state. * Anything broken by having multiple thread-states per thread be either (a) fixed, or (b) justified in terms of a specific CORBA binding implementation. * Anyone wanting anything more remains out on their own, just as now. If so, I am afraid I was hoping for just a little more <wink>. Mark.
[Mark Hammond]
... David, Anthony and I all have this problem today. Tim Peters can see the problem and can see it exists (even if he believes my current implementation is incorrect).
I haven't looked at this in at least 2 years. Back when I did, I thought there *may* be rare races in how the Win32 classes initialized themselves. That may not have been the case in reality. I'd like to intensify the problem, though: you're in a thread and you want to call a Python API function safely. Period. You don't know anything else. You don't even know whether Python has been initialized yet, let alone whether there's already a thread state, and/or an interpreter state, sitting around available for you to use. You don't even know whether you're a thread created by Python or via some other means. I believe that, in order to end this pain forever <heh>, even this case must be made tractable. It doesn't preclude that a thread knowing more than nothing may be able to do something cheaper and simpler than a thread that knows nothing at all. I'd also like to postulate that proposed solutions can rely on a new Python C API supplying a portable spelling of thread-local storage. We can implement that easily on pthreads and Windows boxes, it seems to me to cut to the heart of several problems, and I'm willing to say that Python threading doesn't work anymore on other boxes until platform wizards volunteer code to implement this API there too. Since the start, Python threading has been constrained by the near-empty intersection of what even the feeblest platform thread implementations supply, and that creates problems without real payback. Let 'em eat Stackless <wink>.
[Tim]
I'd like to intensify the problem, though:
Good! I was just taking small steps to get to the same endpoint.
you're in a thread and you want to call a Python API function safely. Period. You don't know anything else. You don't even know whether Python has been initialized yet, let alone whether there's already a thread state, and/or an interpreter state, sitting around available for you to use.
Agreed 100%. In some ways, I believe this is just the conclusion from my 2 points taken together if we can ignore the current world order. I split them to try and differentiate the requirements from the current API, but if we progress to Tim's description, my points become: 1) Becomes exactly as Tim stated. 2.1) Stays the same - release the GIL. 2.2) Goes away - if (1) requires no knowledge of Python's state, there is no need for extensions to take special action just to enable this.
I'd also like to postulate that proposed solutions can rely on a new Python C API supplying a portable spelling of thread-local storage. We can implement that easily on pthreads and Windows boxes, it seems to me to cut to the heart of several problems, and I'm willing to say that Python threading doesn't work anymore on other boxes until platform wizards volunteer code to implement this API there too.
This sounds good to me. After you have done the Win98 version, I volunteer to port it to Win2k <wink>. I believe we have a reasonable starting point. Our PEP could have: * All the usual PEP fluff <wink> * Define the goal, basically as stated by Tim. * Define a new C API specifically for this purpose, probably as an "optional extension" to the existing thread state APIs. * Define a TLS interface that all ports must implement *iff* this new API is to be available. This sounds reasonable to me unless we can see a number of other uses for TLS - in which case the TLS interface would probably get its own PEP, with this PEP relying on it. However, I don't see too much need for TLS - once we have our hands on a Python thread-state, we have a thread-specific dictionary available today, and a TLS dictionary from inside your Python code is trivial. From what I can see, we just need platform TLS to get hold of our thread-state, from which point we can (and do) manage our own thread specific data. How does this sound? Mark.
Tim Peters <tim.one@comcast.net> writes:
I'd like to intensify the problem, though: you're in a thread and you want to call a Python API function safely. Period.
Are there semantic requirements to the Python API in this context, with respect to the state of global things? E.g. when I run the simple string "import sys;print sys.modules", would I need to get the same output that I get elsewhere? If yes, is it possible to characterize "elsewhere" any better? Regards, Martin
[Martin]
Tim Peters <tim.one@comcast.net> writes:
I'd like to intensify the problem, though: you're in a thread and you want to call a Python API function safely. Period.
Are there semantic requirements to the Python API in this context, with respect to the state of global things? E.g. when I run the simple string "import sys;print sys.modules", would I need to get the same output that I get elsewhere? If yes, is it possible to characterize "elsewhere" any better?
Yes, good catch. A PyInterpreterState must be known, and as you stated previously, it is trivial to get one of these and stash it away globally. The PyThreadState is the problem child. Mark.
Mark Hammond wrote:
Yes, good catch. A PyInterpreterState must be known, and as you stated previously, it is trivial to get one of these and stash it away globally. The PyThreadState is the problem child.
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's. Regards, Martin
Mark Hammond wrote:
Yes, good catch. A PyInterpreterState must be known, and as you stated previously, it is trivial to get one of these and stash it away globally. The PyThreadState is the problem child.
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's.
Sheesh - lucky this is mildly entertaining <wink>. You are free to infer what you like, but I believe it is clear and would prefer to see a single other person with a problem rather than continue pointless semantic games. Tiredly, Mark.
Mark Hammond wrote:
Sheesh - lucky this is mildly entertaining <wink>. You are free to infer what you like, but I believe it is clear and would prefer to see a single other person with a problem rather than continue pointless semantic games.
Feel free to ignore me if you think you have the requirements specified, and proceed right away to presenting the solution. Regards, Martin
--- Mark Hammond <mhammond@skippinet.com.au> wrote:
[...] I believe it is clear and would prefer to see a single other person with a problem [...]
I've been reading this thread with interest since we've recently fought (and lost) this battle at my company. Here is our use case: We use Python in primarily two ways, the first and obvious use is as a scripting language with a small group of us creating extensions to talk to existing libraries. There are no relevant problems here. The second use is as a data structures library used from C++. We created a very easy to use C++ class that has a bazillion operator overloads and handles all the reference counting and what not for the user. It used to handle threading too, but that proved to be very difficult. Think of this C++ class as something similar to what boost::python::{object, dict, list, tuple, long, numeric} provides, but intended for users who don't really like or want to know C++. Most of our users write small C++ processes that communicate amongst themselves via an assortment of IPC mechanisms. Occasionally these C++ processes are threaded, and we wanted to handle that. Our model was that C++ code would never hold the GIL, and that before we entered the Python API we would use pthread_getspecific (thread local storage) to see if there was a valid PyThreadState to use. If there wasn't a thread state, we would create one. Since C++ code never held the GIL, we'd always acquire it. This strategy allows all Python threads to take turns running, and allows any C++ threads to enter into Python when needed. Performance lagged a little this way, but not so much that we cared. The problem came when our users started to write generic libraries to be used from C++ and also wanted these libraries as Python extensions. In one case, their library would be used up in a standalone C++ process (where the GIL was not held), and in another they would use boost to try and export their library as an extension to Python (where the GIL was held). The same C++ library couldn't know in advance if the GIL was held. The way boost templatizes on your functions and classes, it is not at all clear when you can safely release the GIL for the benefit of the C++ library being wrapped up that expects the GIL is not held. Since being able to support writing generic libraries easily is more important to us than supporting multithreaded C++ processes (using Python as a data structure library), we changed our strategy and made it so that in C++ the GIL was held by default. Since for these types of processes "most" of our time is spent in C++, no Python threads ever get a chance to run without additional work from the C++ author. It also requires additional work to have multiple C++ threads use Python. This was pretty unsatisfying to those of us who like to work with threads. It's too late to make this long story short, but what would have made our situation much easier would be something like: void *what_happened = Py_AcquireTheGilIfIDontAlreadyHaveIt(); // Can safely call Python API functions here, no matter what the // context is... Py_ReleaseTheGilIfImSupposedTo(what_happened); I hope seeing another side of this is of some use. Cheers, -Scott __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
[Martin v. Lowis]
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's.
If so, I doubt they'll stay that way <wink>. I don't want Mark to *have* to know whether there's an interpreter state available, so the all-purpose prologue code will need to have a way to know that without Mark's help. I do want Mark to be *able* to use a leaner prologue dance if he happens to know that an interpreter state is available. I'd also like for that leaner prologue dance to be able to assert that an interpreter state is indeed available. "The leaner prologue dance" may be identical to the "all-purpose prologue code"; whether or not it can be is an implementation detail, which should become clear later.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
Mark Hammond wrote:
Yes, good catch. A PyInterpreterState must be known, and as you stated previously, it is trivial to get one of these and stash it away globally. The PyThreadState is the problem child.
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's.
Sheesh - lucky this is mildly entertaining <wink>. You are free to infer what you like, but I believe it is clear and would prefer to see a single other person with a problem rather than continue pointless semantic games.
In this instance, it looks to me like Martin makes a good point. If I'm missing something, I'd appreciate an explanation. Thanks, -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's.
Sheesh - lucky this is mildly entertaining <wink>. You are free to infer what you like, but I believe it is clear and would prefer to see a single other person with a problem rather than continue pointless semantic games.
In this instance, it looks to me like Martin makes a good point. If I'm missing something, I'd appreciate an explanation.
There was no requirement that identical code be used in all cases. Checking if Python is initialized is currently trivial, and requires no special inference skills. It is clear that some consideration will need to be given to the PyInterpreterState used for all this, but that is certainly tractable - every single person who has spoken up with this requirement to date has indicated that their application does not need multiple interpreter states - so explicitly ignoring that case seems fine. Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
Then of course you know more than Tim would grant you: you do have an interpreter state, and hence you can infer that Python has been initialized. So I infer that your requirements are different from Tim's.
Sheesh - lucky this is mildly entertaining <wink>. You are free to infer what you like, but I believe it is clear and would prefer to see a single other person with a problem rather than continue pointless semantic games.
In this instance, it looks to me like Martin makes a good point. If I'm missing something, I'd appreciate an explanation.
There was no requirement that identical code be used in all cases. Checking if Python is initialized is currently trivial, and requires no special inference skills. It is clear that some consideration will need to be given to the PyInterpreterState used for all this, but that is certainly tractable - every single person who has spoken up with this requirement to date has indicated that their application does not need multiple interpreter states - so explicitly ignoring that case seems fine.
I understand now, thanks. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
[Tim]
I'd like to intensify the problem, though: you're in a thread and you want to call a Python API function safely. Period.
[martin@v.loewis.de]
Are there semantic requirements to the Python API in this context, with respect to the state of global things?
No more than self-consistency, I expect, same as for a proper call now run from a thread.
E.g. when I run the simple string "import sys;print sys.modules", would I need to get the same output that I get elsewhere? If yes, is it possible to characterize "elsewhere" any better?
I don't know what "elsewhere" means at all. Let's make some assumptions first: Python either has already been initialized successfully, or does initialize successfully as a result of whatever prologue dance is required before you're allowed to "run the simple string". "sys" resolves to the builtin sys. You "run the simple string" at time T1, and it returns at time T2. Nobody finalizes Python "by surprise", or kills it, during this either. Then I expect the strongest that can be said is that the output you get corresponds to the actual state of sys.modules at some time T, with T1 <= T <= T2 (and because you're executing Python code, there's nothing to stop the interpreter from letting some other thread(s) run before and after each of the interpreted string statements, and they can do anything to sys.modules). I don't expect we can say anything stronger than that today either.
Tim Peters wrote:
No more than self-consistency, I expect, same as for a proper call now run from a thread.
That covers the case that proper calls are possible from this thread (although there are still ambiguities: a single that may have multiple thread states, which are associated to multiple interpreters, which may have diffent contents of sys.modules) However, that answer does not work for the case for a thread that has never seen a Python call: proper calls could not run on this thread.
Then I expect the strongest that can be said is that the output you get corresponds to the actual state of sys.modules at some time T
The issue is that there may be multiple sys.modules in the process, at T. Then the question is whether to use one of the existing ones (if so: which one), or create a new one.
I don't expect we can say anything stronger than that today either.
Currently, the extension author must somehow come up with an interpreter state. I'm uncertain whether you are proposing to leave it at that, or whether you require that any solution to "the problem" also provides a way to obtain the "right" interpreter state somehow. Regards, Martin
[Martin v. Lowis]
That covers the case that proper calls are possible from this thread (although there are still ambiguities: a single that may have multiple thread states, which are associated to multiple interpreters, which may have diffent contents of sys.modules)
However, that answer does not work for the case for a thread that has never seen a Python call: proper calls could not run on this thread.
Then I expect the strongest that can be said is that the output you get corresponds to the actual state of sys.modules at some time T
The issue is that there may be multiple sys.modules in the process, at T. Then the question is whether to use one of the existing ones (if so: which one), or create a new one.
All right, you're worried about multiple interpreter states. I'm not -- I've never used them, the Python distribution never uses them, there are no tests of that feature, and they don't seem particularly well-defined in end cases regardless. I'm happy to leave them as an afterthought. If someone wants to champion them, they better ensure their interests are met. As far as I'm concerned, if a user does the all-purpose prologue dance (the "I don't know anything, but I want to use the Python API anyway" one), then the interpreter state in effect isn't defined. It may use an existing interpreter state, or an interpreter state created solely for use by this call, or take one out of a pool of interpreter states reused for such cases, or whatever. Regardless, it's a little tail that I don't want wagging the dog here.
I don't expect we can say anything stronger than that today either.
Currently, the extension author must somehow come up with an interpreter state. I'm uncertain whether you are proposing to leave it at that, or whether you require that any solution to "the problem" also provides a way to obtain the "right" interpreter state somehow.
I define "right" as "undefined" in this case. Someone who cares about multiple interpreter states should feel free to define and propose a stronger requirement. However, the people pushing for change here have explicitly disavowed interest in multiple interpreter states, and I'm happy to press on leaving them for afterthoughts.
On 9 Jan 2003 at 14:23, Tim Peters wrote:
Someone who cares about multiple interpreter states should feel free to define and propose a stronger requirement. However, the people pushing for change here have explicitly disavowed interest in multiple interpreter states, and I'm happy to press on leaving them for afterthoughts.
I have used multiple interpreter states, but not because I wanted to. Consider in-process COM servers implemented in Python. When an application asks for the COM server, the COM support code will do the *loading* on a thread spawned by the COM support code. When the application *uses* the COM server, it may do so on it's own thread (it will certainly do so on a different thread than was used to load the server). Installer freezes in-process COM servers. It does so by using a generic shim dll which gets renamed for each component. Basically, this dll will forward most of the calls on to Mark's PythoncomXX.dll. But it wants to install import hooks. If it is the only Python in the process, everything is fine. But if the application is Python, or the user wants to load more than one Python based COM server, then there already is an interpreter state. Unfortunately, the shim dll can't get to it, so can't install it's import hooks into the right one. At least, that's my recollection of the problems I was having before I gave up. My understanding of COM is relatively superficial. I understand the Python part better, but certainly not completely, (there were things I thought *should* work that I couldn't get working). There may even be a reason to want multiple interpreter states. My understanding of COM and threading is not deep enough for me to make a coherent statement of requirements. But pretty clearly Python's thread API doesn't let me get anywhere close to handling multiple Pythons in one process. -- Gordon http://www.mcmillan-inc.com/
"Gordon McMillan" <gmcm@hypernet.com> writes:
On 9 Jan 2003 at 14:23, Tim Peters wrote:
Someone who cares about multiple interpreter states should feel free to define and propose a stronger requirement. However, the people pushing for change here have explicitly disavowed interest in multiple interpreter states, and I'm happy to press on leaving them for afterthoughts.
I have used multiple interpreter states, but not because I wanted to.
Consider in-process COM servers implemented in Python. When an application asks for the COM server, the COM support code will do the *loading* on a thread spawned by the COM support code. When the application *uses* the COM server, it may do so on it's own thread (it will certainly do so on a different thread than was used to load the server).
Installer freezes in-process COM servers. It does so by using a generic shim dll which gets renamed for each component. Basically, this dll will forward most of the calls on to Mark's PythoncomXX.dll. But it wants to install import hooks.
If it is the only Python in the process, everything is fine. But if the application is Python, or the user wants to load more than one Python based COM server, then there already is an interpreter state. Unfortunately, the shim dll can't get to it, ...
I cannot really believe this. Isn't it the same as for normal, unfrozen inprocess COM servers? The shim dll could do the same as pythoncom22.dll does, or even rely on it to do the right thing. Unfrozen inproc COM works whether the main process is Python or not.
... so can't install it's import hooks into the right one.
IMO, it's the frozen DLL rendering the Python environment unusable for everything else (the main process, for example). I hope using the frozen module mechanism instead of import hooks will make this more tolerant. All this may of course be off-topic for this thread. Thomas
On 10 Jan 2003 at 15:00, Thomas Heller wrote:
"Gordon McMillan" <gmcm@hypernet.com> writes: [...]
Installer freezes in-process COM servers. It does so by using a generic shim dll which gets renamed for each component. Basically, this dll will forward most of the calls on to Mark's PythoncomXX.dll. But it wants to install import hooks. [...] I cannot really believe this. Isn't it the same as for normal, unfrozen inprocess COM servers?
No. COM always loads pythoncom22 in that case. Note that a Python22 app can load a frozen Python21 COM server just fine.
The shim dll could do the same as pythoncom22.dll does, or even rely on it to do the right thing.
That's what it tries to do. It loads pythoncomXX.dll and forwards all the calls it can.
Unfrozen inproc COM works whether the main process is Python or not.
Yes, pythoncom doesn't install import hooks.
... so can't install it's import hooks into the right one.
IMO, it's the frozen DLL rendering the Python environment unusable for everything else (the main process, for example).
I don't understand that statement at all. Working with a (same version) Python app is actually a secondary worry. I'm more bothered that, for example, Excel can't load 2 frozen servers which use the same Python.
I hope using the frozen module mechanism instead of import hooks will make this more tolerant.
But where are those modules frozen? How do they get installed in the already running Python? What if mulitple sets of frozen modules (with dups) want to install themselves?
All this may of course be off-topic for this thread.
It ties into Martin's earlier comments about threading models. It may be that the solution lies in using COM's apartment threading, instead of free threading. That way, the COM server could have it's own interpreter state, and the calls would end up in the right interpreter. Maybe. But I don't understand the COM part well enough, and Mark's stuff supports free threading, not apartment threading. I really brought all this up to try to widen the scope from extension modules which can easily grab an interpreter state and hold onto it. -- Gordon http://www.mcmillan-inc.com/
"Gordon McMillan" <gmcm@hypernet.com> writes:
Working with a (same version) Python app is actually a secondary worry. I'm more bothered that, for example, Excel can't load 2 frozen servers which use the same Python.
A COM component is useless IMO if it restricts which other components you can use, or which client you use, and that's why I didn't allow inproc COM servers in py2exe up to now. But, since this problem doesn't occur with nonfrozen servers, it seems the import hooks are the problem.
I hope using the frozen module mechanism instead of import hooks will make this more tolerant.
But where are those modules frozen? How do they get installed in the already running Python? What if mulitple sets of frozen modules (with dups) want to install themselves?
I hope one could extend the FrozenModule table in an already running Python by adding more stuff to it. Isn't there already code in cvs which allows this?
It ties into Martin's earlier comments about threading models. It may be that the solution lies in using COM's apartment threading, instead of free threading. That way, the COM server could have it's own interpreter state, and the calls would end up in the right interpreter. Maybe.
But I don't understand the COM part well enough, and Mark's stuff supports free threading, not apartment threading.
Last I checked, win32all registers the components as ThreadingModel = both. As I understand this, both STA and MTA is supported (STA = Single Threaded Apartment, MTA = MultiThreaded Appartment). So marking them as STA should be safe if it is needed. Maybe Mark can clear the confusion?
I really brought all this up to try to widen the scope from extension modules which can easily grab an interpreter state and hold onto it.
Thomas
Thomas Heller wrote:
I hope one could extend the FrozenModule table in an already running Python by adding more stuff to it. Isn't there already code in cvs which allows this?
There was code from me that would let you do that from Python, but I ripped it out as PEP 302 makes it unnecessary. However, it's possible from C (and always has been, it's the trick that Anthony Tuininga used in his freeze-like tool). Normally, PyImport_FrozenModules points to a static array, but there's nothing against setting it to a heap array. The fact that it's not a Python object and that the array elements aren't Python objects makes it a little messy, though (another reason why I backed out the Python interface to it). Just
Tim Peters wrote:
[...] I'd also like to postulate that proposed solutions can rely on a new Python C API supplying a portable spelling of thread-local storage. We can implement that easily on pthreads and Windows boxes, it seems to me to cut to the heart of several problems, and I'm willing to say that Python threading doesn't work anymore on other boxes until platform wizards volunteer code to implement this API there too.
FWIW, I am pretty confident that this can be done (read: copied) as Douglas Schmidt has implemented it (on more platforms than python supports <wink>) in the Adapative Communication Framework (ACE): http://doc.ece.uci.edu/Doxygen/Beta/html/ace/classACE__TSS.html As usual, Douglas Schmidt also presents a detailed platform analysis and research about his implementation: http://www.cs.wustl.edu/~schmidt/PDF/TSS-pattern.pdf "This paper describes the Thread-Specific Storage pattern, which alleviates several problems with multi-threading performance and programming complexity. The Thread-Specific Storage pattern improves performance and simplifies multithreaded applications by allowing multiple threads to use one logically global access point to retrieve thread-specific data without incurring locking overhead for each access." regards, holger
holger krekel <pyth@devel.trillke.net> writes:
Tim Peters wrote:
[...] I'd also like to postulate that proposed solutions can rely on a new Python C API supplying a portable spelling of thread-local storage. We can implement that easily on pthreads and Windows boxes, it seems to me to cut to the heart of several problems, and I'm willing to say that Python threading doesn't work anymore on other boxes until platform wizards volunteer code to implement this API there too.
FWIW, I am pretty confident that this can be done (read: copied) as Douglas Schmidt has implemented it (on more platforms than python supports <wink>) in the Adapative Communication Framework (ACE):
http://doc.ece.uci.edu/Doxygen/Beta/html/ace/classACE__TSS.html
We also have a TSS implementation in the Boost.Threads library. I haven't looked at the ACE code myself, but I've heard that every component depends on many others, so it might be easier to extract useful information from the Boost implementation. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
We also have a TSS implementation in the Boost.Threads library. I haven't looked at the ACE code myself, but I've heard that every component depends on many others, so it might be easier to extract useful information from the Boost implementation.
Without looking at either Boost or ACE, I would guess that neither will help much: We would be looking for TLS support for AtheOS, BeOS, cthreads, lwp, OS/2, GNU pth, Solaris threads, SGI threads, and Win CE. I somewhat doubt that either Boost or ACE aim for such a wide coverage. Regards, Martin
[Martin]
David Abrahams <dave@boost-consulting.com> writes:
We also have a TSS implementation in the Boost.Threads library. I haven't looked at the ACE code myself, but I've heard that every component depends on many others, so it might be easier to extract useful information from the Boost implementation.
Without looking at either Boost or ACE, I would guess that neither will help much: We would be looking for TLS support for AtheOS, BeOS, cthreads, lwp, OS/2, GNU pth, Solaris threads, SGI threads, and Win CE. I somewhat doubt that either Boost or ACE aim for such a wide coverage.
We could simply have a "pluggable TLS" design. It seems that everyone who has this requirement is interfacing to a complex library (com, xpcom, Boost, ACE), and in general, these libraries also require TLS. So consider an API such as: PyTSM_HANDLE Py_InitThreadStateManager( void (*funcTLSAlloc)(...), ... PyInterpreterState = NULL); void Py_EnsureReadyToRock(PyTSM_HANDLE); void Py_DoneRocking(PyTSM_HANDLE); ... Obviously the spelling is drastically different, but the point is that we can lean on the extension module itself, rather than the platform, to provide the TLS. In the case of Windows and a number of other OS's, you could fallback to a platform implementation if necessary, but in the case of xpcom, for example, you know that xpcom also defines its own TLS API, so anywhere we need the extension module, TLS comes for "free", even if no one has ported the platform TLS API to the Python TLS API. Our TLS requirements are very simple, and could be "spelt" in a small number of function pointers. Such a design also handles any PyInterpreterState issues - we simply assert if the passed pointer is non-NULL, and leave it to someone who cares to fix <wink>. Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
[Martin]
David Abrahams <dave@boost-consulting.com> writes:
We also have a TSS implementation in the Boost.Threads library. I haven't looked at the ACE code myself, but I've heard that every component depends on many others, so it might be easier to extract useful information from the Boost implementation.
Without looking at either Boost or ACE, I would guess that neither will help much: We would be looking for TLS support for AtheOS, BeOS, cthreads, lwp, OS/2, GNU pth, Solaris threads, SGI threads, and Win CE. I somewhat doubt that either Boost or ACE aim for such a wide coverage.
We could simply have a "pluggable TLS" design.
It seems that everyone who has this requirement is interfacing to a complex library (com, xpcom, Boost, ACE), and in general, these libraries also require TLS.
Boost isn't in that category. Boost provides a threading library to establish a platform-independent C++ interface for threading, but to date none of the other Boost libraries depend on the use of Boost.Threads. In other words, Boost doesn't require TLS, but it can provide TLS ;-)
So consider an API such as:
PyTSM_HANDLE Py_InitThreadStateManager( void (*funcTLSAlloc)(...), ... PyInterpreterState = NULL);
void Py_EnsureReadyToRock(PyTSM_HANDLE); void Py_DoneRocking(PyTSM_HANDLE); ...
Obviously the spelling is drastically different, but the point is that we can lean on the extension module itself, rather than the platform, to provide the TLS. In the case of Windows and a number of other OS's, you could fallback to a platform implementation if necessary, but in the case of xpcom, for example, you know that xpcom also defines its own TLS API, so anywhere we need the extension module, TLS comes for "free", even if no one has ported the platform TLS API to the Python TLS API. Our TLS requirements are very simple, and could be "spelt" in a small number of function pointers.
I take it you are planning to provide a way to get the neccessary TLS from Python's API (in case it isn't lying about elsewhere), but not neccessarily port it to every platform? If so, that sounds like a fine approach. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
[David]
"Mark Hammond" <mhammond@skippinet.com.au> writes:
We could simply have a "pluggable TLS" design.
It seems that everyone who has this requirement is interfacing to a complex library (com, xpcom, Boost, ACE), and in general, these libraries also require TLS.
Boost isn't in that category. Boost provides a threading library to establish a platform-independent C++ interface for threading, but to date none of the other Boost libraries depend on the use of Boost.Threads. In other words, Boost doesn't require TLS, but it can provide TLS ;-)
Yes, this is exactly what I meant. Mozilla is another good example. Mozilla does not require TLS, but indeed builds its own API for it - ie, xpcom does not require it, but does provide it. While Mozilla therefore has TLS ports for many many platforms, this doesn't help us directly, as we can't just lift their code (MPL, etc). But I believe we could simply lean on them for their implementations at runtime.
I take it you are planning to provide a way to get the neccessary TLS from Python's API (in case it isn't lying about elsewhere), but not neccessarily port it to every platform?
I am not sure what you mean by "get the necessary TLS from Python's API". I don't see a need for Python to expose any TLS functionality. If TLS is required *only* for this thread-state magic, then Python just consumes TLS, never exposes it. It obviously does expose an API which internally uses TLS, but it will not expose TLS itself. I forsee a "bootstrap prelude dance" which an extension library must perform, setting up these pointers exactly once. The obvious question from this approach is how to deal with *multiple* libraries in one app. For example, what happens when a single Python application wishes to use Boost *and* xpcom, and both attempt their bootstrap prelude, each providing a TLS implementation? Off the top of my head, a "first in wins" strategy may be fine - we dont care *who* provides TLS, so long as we have it. We don't really have a way to unload an extension module, so lifetime issues may not get in our way. Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
The obvious question from this approach is how to deal with *multiple* libraries in one app. For example, what happens when a single Python application wishes to use Boost *and* xpcom, and both attempt their bootstrap prelude, each providing a TLS implementation?
I would advise to follow Tim's strategy: Make TLS part of the thread_* files, accept that on some threading configuration, there won't be TLS until somebody implements it, and make TLS usage part of the core instead of part of the extension module. I doubt any of the potential TLS providers supports more than Win32 or pthreads. Regards, Martin
I would advise to follow Tim's strategy: Make TLS part of the thread_* files, accept that on some threading configuration, there won't be TLS until somebody implements it, and make TLS usage part of the core instead of part of the extension module.
I doubt any of the potential TLS providers supports more than Win32 or pthreads.
Yeah, I'm not crazy on the idea myself - but I think it has merit. I'm thinking mainly of xpcom, which has pretty reasonable support beyond pthreads and win32 - but I am more than happy to stick it in the YAGNI basket. Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
[David]
"Mark Hammond" <mhammond@skippinet.com.au> writes:
We could simply have a "pluggable TLS" design.
It seems that everyone who has this requirement is interfacing to a complex library (com, xpcom, Boost, ACE), and in general, these libraries also require TLS.
Boost isn't in that category. Boost provides a threading library to establish a platform-independent C++ interface for threading, but to date none of the other Boost libraries depend on the use of Boost.Threads. In other words, Boost doesn't require TLS, but it can provide TLS ;-)
Yes, this is exactly what I meant. Mozilla is another good example. Mozilla does not require TLS, but indeed builds its own API for it - ie, xpcom does not require it, but does provide it.
While Mozilla therefore has TLS ports for many many platforms, this doesn't help us directly, as we can't just lift their code (MPL, etc). But I believe we could simply lean on them for their implementations at runtime.
Ah, so that void (*funcTLSAlloc)(...) was supposed to be something supplied by the extension writer? Hmm, the Boost interface doesn't work that way, and AFAICT wouldn't be easily adapted to it. It basically works like this: the user declares a special C++ TSS object which internally holds a pointer. That pointer has a different value in each thread, and if you want more storage, you can allocate it and stick it in the pointer. The user can declare any number of these TSS objects, up to some implementation-specified limit. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
etc). But I
believe we could simply lean on them for their implementations at runtime.
Ah, so that void (*funcTLSAlloc)(...) was supposed to be something supplied by the extension writer?
Hmm, the Boost interface doesn't work that way, and AFAICT wouldn't be easily adapted to it.
Windows and Mozilla work as you describe too, but I don't see the problem. For both of these, we would just provide a 3 <wink> line stub function, which uses the platform TLS API to return a "void *" we previously stashed. This local function is passed in. But yeah, as I said before, happy to YAGNI it. Mark.
"Mark Hammond" <mhammond@skippinet.com.au> writes:
etc). But I
believe we could simply lean on them for their implementations at runtime.
Ah, so that void (*funcTLSAlloc)(...) was supposed to be something supplied by the extension writer?
Hmm, the Boost interface doesn't work that way, and AFAICT wouldn't be easily adapted to it.
Windows and Mozilla work as you describe too, but I don't see the problem. For both of these, we would just provide a 3 <wink> line stub function, which uses the platform TLS API to return a "void *" we previously stashed. This local function is passed in.
I can't really imagine what you're suggesting here. Code samples help.
But yeah, as I said before, happy to YAGNI it.
Not sure what "it" is supposed to be here, either. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
[David Abrahams]
I can't really imagine what you're suggesting here. Code samples help.
OK :) Here is some code demonstrating my "pluggable" idea, but working backwards <wink>. (Please don't pick on the names, or where I will need to pass a param, or where I forgot to cast, etc <wink>) First, let's assume we come up with a high-level API similar to: /*** Python "auto-thread-state" API ***/ typedef void *PyATS_HANDLE; /* Get's a "cookie" for use in all subsequent auto-thread-state calls. Generally called once per application/extension. Not strictly necessary, but a vehicle to abstract PyInterpreterState should people care enough. */ PyATS_HANDLE PyAutoThreadState_Init(); /* Ensure we have Python ready to rock. This is the "slow" version that assumes nothing about Python's state, other than the handle is valid. */ int PyAutoThreadState_Ensure(PyATS_HANDLE); /* Notify the auto-thread-state mechanism that we are done - there should be one Release() per Ensure(). Again, maybe not necessary if we are super clever, but for the sake of argument ...<wink> */ void PyAutoThreadState_Release(PyATS_HANDLE); /* end of definitions */ This is almost the "holy grail" for me. Your module/app init code does: PyATS_HANDLE myhandle = PyAutoThreadState_Init() And your C function entry points do a PyAutoThreadStateEnsure()/Release() pair. That is it! Your Python extension functions generally need take no special action, including releasing the lock, as PyAutoThreadStateEnsure() is capable of coping with the fact the lock is already held by this thread. So, to my mind, that sketches out the high-level API we are discussing. Underneath the covers, Python will need TLS to implement this. We have 2 choices for the TLS: * Implement it inside Python as part of the platform threading API. This works fine in most scenarios, but may potentially let down e.g. some Mozilla xpcom users - users where Python is ported, but this TLS API is not. Those platforms could not use this new AutoThreadState API, even though the application has a functional TLS implemention provided by xpcom. * Allow the extension author to provide "pluggable" TLS. This would expand the API like so: /* Back in the "auto thread state" header struct PyTLS_FUNCS = { /* Save a PyThreadState pointer in TLS */ int (*pfnSaveThreadState)(PyThreadState *p); /* Release the pointer for the thread (as the thread dies) */ void (*pfnReleaseThreadState)(); /* Get the saved pointer for this thread */ PyThreadState *(*pfnGetThreadState)(); } /* For the Win32 extensions, I would provide the following code in my extension */ DWORD dwTlsIndex = 0; // The TLS functions we "export" back to Python. int MyTLS_SaveThreadState(PyThreadState *ts) { // allocate space for the pointer in the platform TLS PyThreadState **p = (ThreadData **)malloc(sizeof(PyThreadState *)); if (!p) return -1; *p = ts; TlsSetValue(dwTlsIndex, p); return 0; } void PyThreadState MyTLS_DropThreadState() { PyThreadState **p = (PyThreadState**)TlsGetValue(dwTlsIndex); if (!p) return; TlsSetValue(dwTlsIndex, NULL); free(p); } PyThreadState *MyTLS_FetchThreadState() { return (PyThreadState *)TlsGetValue(dwTlsIndex); } // A structure of function pointers defined by Python. Py_AutoThreadStateFunctions myfuncs = { MyTLS_SaveThreadState, MyTLS_DropThreadState, MyTLS_FetchThreadState } /* End of Win32 code */ The XPCOM code would look almost identical, except spelt PR_GetThreadPrivate, PR_SetThreadPrivate etc. I assume pthreads can also fit into this scheme.
But yeah, as I said before, happy to YAGNI it.
Not sure what "it" is supposed to be here, either.
I'm happy to YAGNI the pluggable TLS idea. I see that the number of users who would actually benefit is tiny. Keeping the TLS api completely inside Python is fine with me. Mark.
On 10 Jan 2003 at 10:39, Martin v. Löwis wrote:
Without looking at either Boost or ACE, I would guess that neither will help much: We would be looking for TLS support for AtheOS, BeOS, cthreads, lwp, OS/2, GNU pth, Solaris threads, SGI threads, and Win CE. I somewhat doubt that either Boost or ACE aim for such a wide coverage.
I've heard the claim that ACE runs on more platforms than Java. -- Gordon http://www.mcmillan-inc.com/
Gordon McMillan wrote:
I've heard the claim that ACE runs on more platforms than Java.
See http://www.cs.wustl.edu/~schmidt/ACE-overview.html That claim may come from the support for RTOSs (such as pSOS), and they may also have counted older versions of systems which Java hasn't been ported to (such HPUX 9). However, the minority thread libraries that Python supports (AtheOS, OS/2), appear to be unsupported in ACE. I agree with Tim's statement that there is no real problem in breaking support for these systems - if somebody cares about them, somebody will fix it, else we can rip it out. I was just responding to the claim that looking elsewhere may help. Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
We also have a TSS implementation in the Boost.Threads library. I haven't looked at the ACE code myself, but I've heard that every component depends on many others, so it might be easier to extract useful information from the Boost implementation.
Without looking at either Boost or ACE, I would guess that neither will help much: We would be looking for TLS support for AtheOS, BeOS, cthreads, lwp, OS/2, GNU pth, Solaris threads, SGI threads, and Win CE. I somewhat doubt that either Boost or ACE aim for such a wide coverage.
Boost covers only pthreads and Win32 at the moment. I thought I understood Tim to be saying that all of the other ones should be considered broken in Python anyway until proven otherwise, which is why I bothered to mention it. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams wrote:
Boost covers only pthreads and Win32 at the moment. I thought I understood Tim to be saying that all of the other ones should be considered broken in Python anyway until proven otherwise, which is why I bothered to mention it.
Right. My response was really addressing Holger's suggestion that support for those platforms can be copied from ACE, and your suggestion that this support is better copied from Boost. Neither will help for the platforms for which Tim is willing to say that they become broken. Regards, Martin
"Mark Hammond" <mhammond@skippinet.com.au> writes:
Mark Hammond wrote:
1) Allow "arbitrary" threads (that is, threads never before seen by Python) to acquire the resources necessary to call the Python C API.
This is possible today, all you need is a pointer to an interpreter state. If you have that, you can use PyThreadState_New,
But what if in some cases, this callback is as a result of Python code on the same thread - ie, there already exists a Python thread-state higher up the stack?
I believe that's the case which bit me. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
"Mark Hammond" <mhammond@skippinet.com.au> writes:
My goal:
For a multi-threaded application (generally this will be a larger app embedding Python, but that is irrelevant), make it reasonably easy to accomplish 2 things:
1) Allow "arbitrary" threads (that is, threads never before seen by Python) to acquire the resources necessary to call the Python C API.
2) Allow Python extensions to be written which support (1) above.
3) Allow "arbitrary" threads to acquire the resources necessary to call the Python C API, even if they already have those resources, and to later release them if they did not have those resources.
Currently (2) is covered by Py_BEGIN_ALLOW_THREADS, except that it is kinda like only having a hammer in your toolbox <wink>. I assert that 2) could actually be split into discrete goals:
I'm going to ask some questions just to make sure your terminology is clear to me:
2.1) Extension functions that expect to take a lot of time, but generally have no thread-state considerations. This includes sleep(), all IO functions, and many others. This is exactly what Py_BEGIN_ALLOW_THREADS was designed for.
In other words, functions which will not call back into the Python API?
2.2) Extensions that *may* take a little time, but more to the point, may directly and synchronously trigger callbacks.
By "callbacks", do you mean "functions which (may) use the Python C API?"
That is, it is not expected that much time will be spent outside of Python, but rather that Python will be re-entered. I can concede that functions that may trigger asynch callbacks need no special handling here, as the normal Python thread switch mechanism will ensure correct their dispatch.
By "trigger asynch callbacks" do you mean, "cause a callback to occur on a different thread?"
Currently 2.1 and 2.2 are handled the same way, but this need not be the case. Currently 2.2 is only supported by *always* giving up the lock, and at each entry point *always* re-acquiring it. This is obviously wasteful if indeed the same thread immediately re-enters - hence we are here with a request for "how do I tell if I have the lock?".
Yep, that pinpoints my problem.
Combine this with the easily stated but tricky to implement (1) and no one understands it at all <frown>
I also propose that we restrict this to applications that intend to use a single "PyInterpreterState" - if you truly want multiple threads running in multiple interpreters (and good luck to you - I'm not aware anyone has ever actually done it <wink>) then you are on your own.
Fine with me ;-). I think eventually we'll need to come up with a more precise definition of exactly when "you're on your own", but for now that'll do.
Are these goals a reasonable starting point? This describes all my venturing into this area.
Sounds about right to me. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
We do have a real problem here, and I keep stumbling across it. So far, this issue has hit me in the win32 extensions, in Mozilla's PyXPCOM, and even in Gordon's "installer". IMO, the reality is that the Python external thread-state API sucks. I can boldly make that assertion as I have heard many other luminaries say it before me. As Tim suggests, time is the issue.
I fear the only way to approach this is with a PEP. We need to clearly state our requirements, and clearly show scenarios where interpreter states, thread states, the GIL etc all need to cooperate. Eg, InterpreterState's seem YAGNI, but manage to complicate using ThreadStates, which are certainly YNI. The ability to "unconditionally grab the lock" may be useful, as may a construct meaning "I'm calling out to/in from an external API" discrete from the current singular "release/acquire the GIL" construct available today.
I'm willing to help out with this, but not take it on myself. I have a fair bit to gain - if I can avoid toggling locks every time I call out to each and every function there would be some nice perf gains to be had, and horrible code to remove.
I welcome a PEP on this! It's above my own level of expertise, mostly because I'm never in a position to write code that runs into this... --Guido van Rossum (home page: http://www.python.org/~guido/)
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
So you're saying that the callback functions in B acquire the GIL?
Yes.
Ok. What would break if they wouldn't?
All of the places where Q invokes the callback on arbitrary threads it has started, but which don't hold the GIL. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
Ok. What would break if they wouldn't?
All of the places where Q invokes the callback on arbitrary threads it has started, but which don't hold the GIL.
So Q creates new threads which perform callbacks? But Q also performs the callbacks when invoked from A? Sounds like a bug in Q to me... Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
Ok. What would break if they wouldn't?
All of the places where Q invokes the callback on arbitrary threads it has started, but which don't hold the GIL.
So Q creates new threads which perform callbacks? But Q also performs the callbacks when invoked from A? Sounds like a bug in Q to me...
Why do you say that? Q doesn't know anything about Python or its constraints. Why should it be prohibited from invoking these callbacks in whatever way it deems appropriate for its problem domain? I know, I know, all library authors should design with Python in mind <wink>, but seriously, Q == Qt, a library that's used extensively and successfully by thousands. It's conceivable that this is a serious design flaw in Qt, but I'm inclined to disbelieve that. I think rather that this is a library design which doesn't interoperate well with Python's constraints on GIL manipulation. In this case, as Python is intended to be good for general interoperability, it seems like Python ought to budge if possible. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
So Q creates new threads which perform callbacks? But Q also performs the callbacks when invoked from A? Sounds like a bug in Q to me...
Why do you say that? Q doesn't know anything about Python or its constraints. Why should it be prohibited from invoking these callbacks in whatever way it deems appropriate for its problem domain?
It is moderately evil for a library to create threads "under the hoods", IMO; in some domains, that might be a reasonable thing to do, provided there is a way for the application author to manage the threads on a higher level (e.g. by limiting the total number of threads that the library can create simultaneously). If a library is creating new threads and invokes application code in these threads, threading should follow a threading model. That threading model has to be described, so that every application author can rely on certain features. The threading model is part of the library interface, just like the API. It appears that Q has no threading model. That is truly evil. In some cases, combining libraries with different threading models just won't work. For example, I recently found that Tcl's appartment threading model isn't really compatible with Python's GIL. It is possible to achieve interworking to some respect, but there are limitations that just can't be overcome (e.g. you just cannot invoke event dispatching in a thread that hasn't originally create the Tcl interpreter). If the threading model of Q is unknown or undefined, you cannot expect any kind of interworking.
I think rather that this is a library design which doesn't interoperate well with Python's constraints on GIL manipulation.
It seems to me that there is no design in the library, and this is the cause for the interoperability problem (or, perhaps, you just haven't presented the design).
In this case, as Python is intended to be good for general interoperability, it seems like Python ought to budge if possible.
As Tim explains, this might not be possible. Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
So Q creates new threads which perform callbacks? But Q also performs the callbacks when invoked from A? Sounds like a bug in Q to me...
Why do you say that? Q doesn't know anything about Python or its constraints. Why should it be prohibited from invoking these callbacks in whatever way it deems appropriate for its problem domain?
It is moderately evil for a library to create threads "under the hoods", IMO; in some domains, that might be a reasonable thing to do, provided there is a way for the application author to manage the threads on a higher level (e.g. by limiting the total number of threads that the library can create simultaneously).
I am not intimately familiar with Qt; these threads may not in fact be created "under the hood". Whether they are or not is IMO irrelevant to the problem we're having, because it's not how the thread is started that matters.
If a library is creating new threads and invokes application code in these threads, threading should follow a threading model. That threading model has to be described, so that every application author can rely on certain features. The threading model is part of the library interface, just like the API.
It appears that Q has no threading model. That is truly evil.
Though I am not intimately familiar with Qt, I can assure you that it *does* have a threading model.
In some cases, combining libraries with different threading models just won't work. For example, I recently found that Tcl's appartment threading model isn't really compatible with Python's GIL. It is possible to achieve interworking to some respect, but there are limitations that just can't be overcome (e.g. you just cannot invoke event dispatching in a thread that hasn't originally create the Tcl interpreter).
If the threading model of Q is unknown or undefined, you cannot expect any kind of interworking.
I think rather that this is a library design which doesn't interoperate well with Python's constraints on GIL manipulation.
It seems to me that there is no design in the library
I think that judgement is at best premature.
and this is the cause for the interoperability problem (or, perhaps, you just haven't presented the design).
No, I haven't. I'm not very familiar with it myself. I'm just relating information I've got from the author of PyQt, who is very familiar with it.
In this case, as Python is intended to be good for general interoperability, it seems like Python ought to budge if possible.
As Tim explains, this might not be possible.
It sounded to me like Tim explained that it is possible but unimplemented. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
I am not intimately familiar with Qt
Ah, so Q is Qt :-) Care to reveal what A and B are, and which the callback is that caused a deadlock?
It sounded to me like Tim explained that it is possible but unimplemented.
Trying to channel Tim: All experts for this stuff have tried and failed; Mark Hammond has a sort-of solution which Tim believes to be strictly-speaking incorrect. While people may have thought they have a solution, it is troubling that nobody can remember what the solution is. Maybe the email signature just had not enough space to write it down :-) Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
I am not intimately familiar with Qt
Ah, so Q is Qt :-)
I actually revealed that several messages back.
Care to reveal what A and B are
A is some extension module written by one of my users, that calls Qt but doesn't install any callbacks. B is some extension module which uses PyQt, thus installs callbacks which invoke Python.
and which the callback is that caused a deadlock?
There was no deadlock, as I've said. The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards. I speculate that the callback releases the GIL when it is finished, so that when A returns to Python there is no current thread. By that time, the callback has completed, so it's hard to know for sure which one it was.
It sounded to me like Tim explained that it is possible but unimplemented.
Trying to channel Tim: All experts for this stuff have tried and failed; Mark Hammond has a sort-of solution which Tim believes to be strictly-speaking incorrect. While people may have thought they have a solution, it is troubling that nobody can remember what the solution is. Maybe the email signature just had not enough space to write it down :-)
Hmm, it seems as though a mutex-protected record of which thread is currently holding the GIL should be enough to handle it. There must be subtle details which complicate the problem, e.g. that there's no portable/reliable way to identify the current thread (?) -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
There was no deadlock, as I've said.
Yes, you only said this was a no-no, in http://mail.python.org/pipermail/python-dev/2002-December/031449.html I inferred from that (apparently incorrectly) that the no-no is a deadlock.
The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards.
Now I'm confused. In http://mail.python.org/pipermail/python-dev/2002-December/031424.html you said "A must also release the GIL" and then "the author of A may have had no reason to believe anyone would install Python callbacks in Q". From that I inferred that A does *not* release the GIL (as the author had no reason to). Now you are saying that A releases the GIL. Which one it is?
I speculate that the callback releases the GIL when it is finished, so that when A returns to Python there is no current thread.
That would be a bug in the callback. If there was a thread state when it was called, there should be a thread state when it returns.
Hmm, it seems as though a mutex-protected record of which thread is currently holding the GIL should be enough to handle it.
It depends on what "it" is, here. This one? Q: Is there a way to find out whether the current thread holds the GIL? If so, a mutex-protected record might work, but also might be expensive. Regards, Martin
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
There was no deadlock, as I've said.
Yes, you only said this was a no-no, in
http://mail.python.org/pipermail/python-dev/2002-December/031449.html
I inferred from that (apparently incorrectly) that the no-no is a deadlock.
In that very message, I also wrote: I realize that the docs for PyEval_AcquireLock() say: "If this thread already has the lock, a deadlock ensues", but the behavior we're seeing is consistent with a scenario where trying to acquire an already-held is a no-op and releasing it is unconditional. Eventually the GIL release in B's callback takes effect and when A returns to Python there is no thread state.
The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards.
Now I'm confused. In
http://mail.python.org/pipermail/python-dev/2002-December/031424.html
you said "A must also release the GIL"
Yes, that's the inevitable conclusion.
and then "the author of A may have had no reason to believe anyone would install Python callbacks in Q". From that I inferred that A does *not* release the GIL (as the author had no reason to).
Yes, not releasing the GIL in A was fine until B came along and installed callbacks in Q which acquire the GIL.
Now you are saying that A releases the GIL. Which one it is?
No, I am not saying A releases the GIL. I am saying that A that must release the GIL if it is to work properly in the presence of B. A is currently broken in the presence of B. The addition of B to the system places a new constraint on A.
I speculate that the callback releases the GIL when it is finished, so that when A returns to Python there is no current thread.
That would be a bug in the callback.
Not if it has previously acquired the GIL.
If there was a thread state when it was called, there should be a thread state when it returns.
Yes, the whole problem is that there's no way to know whether there's a thread state.
Hmm, it seems as though a mutex-protected record of which thread is currently holding the GIL should be enough to handle it.
It depends on what "it" is, here. This one?
Q: Is there a way to find out whether the current thread holds the GIL?
If so, a mutex-protected record might work, but also might be expensive.
Yes. I assume that acquiring the GIL already needs to do synchronization, though. -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
David Abrahams <dave@boost-consulting.com> writes:
The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards. [...] No, I am not saying A releases the GIL.
"...there is no thread state. It [the thread state] goes away if A releases the GIL ..."
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards. [...] No, I am not saying A releases the GIL.
"...there is no thread state. It [the thread state] goes away if A releases the GIL ..."
From that I inferred that A releases the GIL, since you said that there is no thread state. Rereading your message, I now see that you meant "It [the problem] goes away".
Right.
So I now understand that you reported that there is no deadlock, and that A does not release the GIL, and that Python reports that there is no thread state "when A returns to Python". You also report that B acquires the GIL.
I can't understand why this happens. How does B acquire the GIL?
Assuming that B uses PyEval_AcquireThread/PyEval_ReleaseThread, I would expect that a) there is a deadlock if this happens in a context of a call to A, since the GIL is already held, and (if, for some reason, locks are recursive on this platform), b) the code
if (PyThreadState_Swap(tstate) != NULL) Py_FatalError( "PyEval_AcquireThread: non-NULL old thread state");
should trigger, as there is an old thread state.
So I infer that B does not use PyEval_AcquireThread/PyEval_ReleaseThread. What else does it use?
Looking at the SIP sources, it appears to be using PyEval_SaveThread/PyEval_RestoreThread, but I'd have to ask Phil to weigh in on this one to know for sure. Here's a stack backtrace reported by my user. You can ignore the oddness of frame #4; the SIP author is patching Python's instance method table, but has convinced me that what he's doing is harmless (it's still evil, of course <wink>). Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1024 (LWP 5948)] PyErr_SetObject (exception=0x8108a8c, value=0x81ab208) at Python/errors.c:39 (gdb) bt #0 PyErr_SetObject (exception=0x8108a8c, value=0x81ab208) at Python/errors.c:39 #1 0x08087ac7 in PyErr_Format (exception=0x8108a8c, format=0x80df620 "%.50s instance has no attribute '%.400s'") at Python/errors.c:408 #2 0x080b0467 in instance_getattr1 (inst=0x82c5654, name=0x8154558) at Objects/classobject.c:678 #3 0x080b3e35 in instance_getattr (inst=0x82c5654, name=0x8154558) at Objects/classobject.c:715 #4 0x40cd2a43 in instanceGetAttr () from /usr/local/lib/python2.2/site-packages/libsip.so #5 0x08056794 in PyObject_GetAttr (v=0x82c5654, name=0x8154558) at Objects/object.c:1108 #6 0x0807705e in eval_frame (f=0x811a974) at Python/ceval.c:1784 #7 0x0807866e in PyEval_EvalCodeEx (co=0x8161de0, globals=0x81139b4, locals=0x81139b4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2595 #8 0x0807a700 in PyEval_EvalCode (co=0x8161de0, globals=0x81139b4, locals=0x81139b4) at Python/ceval.c:481 #9 0x080950b1 in run_node (n=0x81263b8, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", globals=0x81139b4, locals=0x81139b4, flags=0xbffffac4) at Python/pythonrun.c:1079 #10 0x08095062 in run_err_node (n=0x81263b8, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", globals=0x81139b4, locals=0x81139b4, flags=0xbffffac4) at Python/pythonrun.c:1066 #11 0x08094ccb in PyRun_FileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", start=257, globals=0x81139b4, locals=0x81139b4, closeit=1, flags=0xbffffac4) at Python/pythonrun.c:1057 #12 0x080938b1 in PyRun_SimpleFileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", closeit=1, flags=0xbffffac4) at Python/pythonrun.c:685 #13 0x0809481f in PyRun_AnyFileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", closeit=1, flags=0xbffffac4) at Python/pythonrun.c:495 #14 0x08053632 in Py_Main (argc=2, argv=0xbffffb54) at Modules/main.c:364 #15 0x08052ee6 in main (argc=2, argv=0xbffffb54) at Modules/python.c:10 #16 0x40088627 in __libc_start_main (main=0x8052ed0 <main>, argc=2, ubp_av=0xbffffb54, init=0x80522d4 <_init>, fini=0x80cf610 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffffb4c) at ../sysdeps/generic/libc-start.c:129 (gdb) On the line of the error oldtype = tstate->curexc_type; (gdb) p tstate $1 = (PyThreadState *) 0x0 (gdb)
If there was a thread state when it was called, there should be a thread state when it returns.
Yes, the whole problem is that there's no way to know whether there's a thread state.
Wrong. If B acquires the GIL, B must use some thread state to do so. It must install that thread state through PyThreadState_Swap, directly or indirectly. That will return the old thread state, or NULL.
Let me rephrase: the whole problem is that there's no way to know if you have the interpreter lock. You can't call PyThreadState_Swap to find out if there's a thread state if you don't have the interpreter lock. You can't acquire the lock if you already have it.
If so, a mutex-protected record might work, but also might be expensive.
Yes. I assume that acquiring the GIL already needs to do synchronization, though.
Sure. But with that proposed change, you have not only the GIL lock call (which is a single sem_wait call on Posix, and an InterlockedCompareExchange call on Win32). You also get a mutex call, and a call to find out the current thread.
There you go, it's a harder problem than I thought ;-) -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
Aahz <aahz@pythoncraft.com> writes:
I think this thread might be better handled on c.l.py, at least until it's understood well enough to be clear whether something does need to change in Python.
BTW, I tried to comply with your request. Currently the crossover posting which I Bcc'd to python-dev is being held for moderator approval: python-dev-admin@python.org writes:
Your mail to 'Python-Dev' with the subject
Extension modules, Threading, and the GIL
Is being held until the list moderator can review it for approval.
The reason it is being held:
Message has implicit destination
Either the message will get posted to the list, or you will receive notification of the moderator's decision.
-- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution
participants (14)
-
"Martin v. Löwis"
-
Aahz
-
David Abrahams
-
Gordon McMillan
-
Guido van Rossum
-
holger krekel
-
Just van Rossum
-
Mark Hammond
-
Mark Hammond
-
martin@v.loewis.de
-
Scott Gilbert
-
Thomas Heller
-
Tim Peters
-
Tim Peters