
martin@v.loewis.de (Martin v. Löwis) writes:
David Abrahams <dave@boost-consulting.com> writes:
The symptom is that Python complains at some point that there's no thread state. It goes away if A releases the GIL before calling into Qt, and reacquires the GIL afterwards. [...] No, I am not saying A releases the GIL.
"...there is no thread state. It [the thread state] goes away if A releases the GIL ..."
From that I inferred that A releases the GIL, since you said that there is no thread state. Rereading your message, I now see that you meant "It [the problem] goes away".
Right.
So I now understand that you reported that there is no deadlock, and that A does not release the GIL, and that Python reports that there is no thread state "when A returns to Python". You also report that B acquires the GIL.
I can't understand why this happens. How does B acquire the GIL?
Assuming that B uses PyEval_AcquireThread/PyEval_ReleaseThread, I would expect that a) there is a deadlock if this happens in a context of a call to A, since the GIL is already held, and (if, for some reason, locks are recursive on this platform), b) the code
if (PyThreadState_Swap(tstate) != NULL) Py_FatalError( "PyEval_AcquireThread: non-NULL old thread state");
should trigger, as there is an old thread state.
So I infer that B does not use PyEval_AcquireThread/PyEval_ReleaseThread. What else does it use?
Looking at the SIP sources, it appears to be using PyEval_SaveThread/PyEval_RestoreThread, but I'd have to ask Phil to weigh in on this one to know for sure. Here's a stack backtrace reported by my user. You can ignore the oddness of frame #4; the SIP author is patching Python's instance method table, but has convinced me that what he's doing is harmless (it's still evil, of course <wink>). Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1024 (LWP 5948)] PyErr_SetObject (exception=0x8108a8c, value=0x81ab208) at Python/errors.c:39 (gdb) bt #0 PyErr_SetObject (exception=0x8108a8c, value=0x81ab208) at Python/errors.c:39 #1 0x08087ac7 in PyErr_Format (exception=0x8108a8c, format=0x80df620 "%.50s instance has no attribute '%.400s'") at Python/errors.c:408 #2 0x080b0467 in instance_getattr1 (inst=0x82c5654, name=0x8154558) at Objects/classobject.c:678 #3 0x080b3e35 in instance_getattr (inst=0x82c5654, name=0x8154558) at Objects/classobject.c:715 #4 0x40cd2a43 in instanceGetAttr () from /usr/local/lib/python2.2/site-packages/libsip.so #5 0x08056794 in PyObject_GetAttr (v=0x82c5654, name=0x8154558) at Objects/object.c:1108 #6 0x0807705e in eval_frame (f=0x811a974) at Python/ceval.c:1784 #7 0x0807866e in PyEval_EvalCodeEx (co=0x8161de0, globals=0x81139b4, locals=0x81139b4, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2595 #8 0x0807a700 in PyEval_EvalCode (co=0x8161de0, globals=0x81139b4, locals=0x81139b4) at Python/ceval.c:481 #9 0x080950b1 in run_node (n=0x81263b8, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", globals=0x81139b4, locals=0x81139b4, flags=0xbffffac4) at Python/pythonrun.c:1079 #10 0x08095062 in run_err_node (n=0x81263b8, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", globals=0x81139b4, locals=0x81139b4, flags=0xbffffac4) at Python/pythonrun.c:1066 #11 0x08094ccb in PyRun_FileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", start=257, globals=0x81139b4, locals=0x81139b4, closeit=1, flags=0xbffffac4) at Python/pythonrun.c:1057 #12 0x080938b1 in PyRun_SimpleFileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", closeit=1, flags=0xbffffac4) at Python/pythonrun.c:685 #13 0x0809481f in PyRun_AnyFileExFlags (fp=0x8104038, filename=0xbffffca2 "/home/pfkeb/hippodraw-BUILD/testsuite/dclock.py", closeit=1, flags=0xbffffac4) at Python/pythonrun.c:495 #14 0x08053632 in Py_Main (argc=2, argv=0xbffffb54) at Modules/main.c:364 #15 0x08052ee6 in main (argc=2, argv=0xbffffb54) at Modules/python.c:10 #16 0x40088627 in __libc_start_main (main=0x8052ed0 <main>, argc=2, ubp_av=0xbffffb54, init=0x80522d4 <_init>, fini=0x80cf610 <_fini>, rtld_fini=0x4000dcd4 <_dl_fini>, stack_end=0xbffffb4c) at ../sysdeps/generic/libc-start.c:129 (gdb) On the line of the error oldtype = tstate->curexc_type; (gdb) p tstate $1 = (PyThreadState *) 0x0 (gdb)
If there was a thread state when it was called, there should be a thread state when it returns.
Yes, the whole problem is that there's no way to know whether there's a thread state.
Wrong. If B acquires the GIL, B must use some thread state to do so. It must install that thread state through PyThreadState_Swap, directly or indirectly. That will return the old thread state, or NULL.
Let me rephrase: the whole problem is that there's no way to know if you have the interpreter lock. You can't call PyThreadState_Swap to find out if there's a thread state if you don't have the interpreter lock. You can't acquire the lock if you already have it.
If so, a mutex-protected record might work, but also might be expensive.
Yes. I assume that acquiring the GIL already needs to do synchronization, though.
Sure. But with that proposed change, you have not only the GIL lock call (which is a single sem_wait call on Posix, and an InterlockedCompareExchange call on Win32). You also get a mutex call, and a call to find out the current thread.
There you go, it's a harder problem than I thought ;-) -- David Abrahams dave@boost-consulting.com * http://www.boost-consulting.com Boost support, enhancements, training, and commercial distribution