[cpyext] crash in PyThreadState_GetDict()

Hi, I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml. For that, it's enough to build the latest lxml github master version in PyPy and start the test runner. You'll also need the github master version of Cython for that (with contains the PyPy port). I use these commands for building lxml under Linux: $ CFLAGS=-ggdb make PYTHON=pypy inplace $ pypy test.py -vv -p parser html # no specific test, just not all The stack trace I get starts like this: """ #0 0x0000000000b4fac5 in ?? () #1 0x000000000103a813 in PyThreadState_GetDict () #2 0x00007ffff51b15b0 in __pyx_f_4lxml_5etree__getGlobalErrorLog () at src/lxml/lxml.etree.c:32906 #3 0x00007ffff51a8922 in __pyx_f_4lxml_5etree_13_BaseErrorLog__receive (__pyx_v_self=0x7ffff7f99378, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:28299 #4 0x00007ffff51fad46 in __pyx_f_4lxml_5etree__forwardParserError (__pyx_v__parser_context=0x3205c80, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:82070 #5 0x00007ffff51fadc7 in __pyx_f_4lxml_5etree__receiveParserError (__pyx_v_c_context=0x3205c80, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:82126 """ Background: lxml frees the GIL when it starts parsing XML, then reacquires it to report any errors that occur and, while holding it, looks up the target error log object from thread global storage to find out who wants to know about the error. That's where it crashes above, right when it tries to get at the TLS dict. Any help is appreciated. Stefan

2012/6/18 Stefan Behnel <stefan_ml@behnel.de>
Hi,
I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml.
Is it similar to https://bugs.pypy.org/issue1175 ? "PyThread_{get, set, delete}_key_value should work without the GIL held"
For that, it's enough to build the latest lxml github master version in PyPy and start the test runner. You'll also need the github master version of Cython for that (with contains the PyPy port).
I use these commands for building lxml under Linux:
$ CFLAGS=-ggdb make PYTHON=pypy inplace $ pypy test.py -vv -p parser html # no specific test, just not all
The stack trace I get starts like this:
""" #0 0x0000000000b4fac5 in ?? () #1 0x000000000103a813 in PyThreadState_GetDict () #2 0x00007ffff51b15b0 in __pyx_f_4lxml_5etree__getGlobalErrorLog () at src/lxml/lxml.etree.c:32906 #3 0x00007ffff51a8922 in __pyx_f_4lxml_5etree_13_BaseErrorLog__receive (__pyx_v_self=0x7ffff7f99378, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:28299 #4 0x00007ffff51fad46 in __pyx_f_4lxml_5etree__forwardParserError (__pyx_v__parser_context=0x3205c80, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:82070 #5 0x00007ffff51fadc7 in __pyx_f_4lxml_5etree__receiveParserError (__pyx_v_c_context=0x3205c80, __pyx_v_error=0x3205ed8) at src/lxml/lxml.etree.c:82126 """
Background: lxml frees the GIL when it starts parsing XML, then reacquires it to report any errors that occur and, while holding it, looks up the target error log object from thread global storage to find out who wants to know about the error. That's where it crashes above, right when it tries to get at the TLS dict.
Any help is appreciated.
Stefan
_______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev
-- Amaury Forgeot d'Arc

Amaury Forgeot d'Arc, 18.06.2012 16:02:
2012/6/18 Stefan Behnel
I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml.
Is it similar to https://bugs.pypy.org/issue1175 ? "PyThread_{get, set, delete}_key_value should work without the GIL held"
No. As I said, it has already acquired the GIL when it calls that function. Stefan

2012/6/18 Stefan Behnel <stefan_ml@behnel.de>
Amaury Forgeot d'Arc, 18.06.2012 16:02:
2012/6/18 Stefan Behnel
I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml.
Is it similar to https://bugs.pypy.org/issue1175 ? "PyThread_{get, set, delete}_key_value should work without the GIL held"
No. As I said, it has already acquired the GIL when it calls that function.
OK, the answer is quite simple: PyGILState_Ensure() is not really implemented and just returns zero. :-( -- Amaury Forgeot d'Arc

Amaury Forgeot d'Arc, 18.06.2012 16:51:
2012/6/18 Stefan Behnel
Amaury Forgeot d'Arc, 18.06.2012 16:02:
2012/6/18 Stefan Behnel
I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml.
Is it similar to https://bugs.pypy.org/issue1175 ? "PyThread_{get, set, delete}_key_value should work without the GIL held"
No. As I said, it has already acquired the GIL when it calls that function.
OK, the answer is quite simple: PyGILState_Ensure() is not really implemented and just returns zero.
Ah, hmmm. That's annoying. Threading is quite crucial for many use cases of lxml. Is there a chance that this will become available soonish, or should I just compile out threading support when building in PyPy? The code for that is there anyway (although I haven't tried it in years...). Maybe PyPy should undefine the WITH_THREAD macro if it doesn't support threading in C extensions anyway? Stefan

Stefan Behnel, 18.06.2012 19:04:
Amaury Forgeot d'Arc, 18.06.2012 16:51:
2012/6/18 Stefan Behnel
Amaury Forgeot d'Arc, 18.06.2012 16:02:
2012/6/18 Stefan Behnel
I'm getting reproducible segfaults in the PyThreadState_GetDict() function with lxml.
Is it similar to https://bugs.pypy.org/issue1175 ? "PyThread_{get, set, delete}_key_value should work without the GIL held"
No. As I said, it has already acquired the GIL when it calls that function.
OK, the answer is quite simple: PyGILState_Ensure() is not really implemented and just returns zero.
Ah, hmmm. That's annoying. Threading is quite crucial for many use cases of lxml. Is there a chance that this will become available soonish, or should I just compile out threading support when building in PyPy? The code for that is there anyway (although I haven't tried it in years...).
I disabled threading and it fixes a lot of crashes. Thanks for the hint.
Maybe PyPy should undefine the WITH_THREAD macro if it doesn't support threading in C extensions anyway?
... or rather, behave as if WITH_THREAD was undefined and make the GIL handling macros dummies. Stefan
participants (2)
-
Amaury Forgeot d'Arc
-
Stefan Behnel