Looking for clues to consistent Seg Fault in PyPy 2.6.1

Hi, I've run out of options trying to find a Seg Fault which happens when running lxml under PyPy 2.6.1. This problem only occurs under PyPy as the rest of the code works fine under CPython 2.7. I've been in contact with the lxml dev team and they confirmed my problem, but could not determine where the cause of the Seg Fault lies. They suggested reaching out to you folks for ideas or a possible work around as the stack trace is always in libpypy-c.so and seems to be triggered by eventual garbage collection. Below is the gist of the email I sent to the lxml team earlier. It describes my ability to reproduce this Seg Fault using their supplied test suite. As of this writing, lxml is at version 3.5.0b1. I'd appreciate any feedback you might supply on how to deal with this error. Seg Faults are notoriously hard to find since they inevitably happen upstream for the crash. We are anxious to perform a complete evaluation of transitioning to PyPy as our Python platform of choice and this appears to be the last major blockage. Thanks for your help. - Jeff Doran ---------------- I'm running the lxml test suite under PyPy which I installed from here,https://bitbucket.org/squeaky/portable-pypy/downloads/pypy-2.6.1-linux_x86_6... . I've downloaded the current lxml src from https://github.com/lxml/lxml. I built lxml using 'python setup.py build --with-cython' *From ..lxml/src/lxml I run: * nosetests -v --pdb --nocapture Comparing with ElementTree 1.3.0 Comparing with cElementTree 1.3.0 TESTED VERSION: 3.5.0.beta1 Python: (major=2, minor=7, micro=10, releaselevel='final', serial=42) lxml.etree: (3, 5, 0, -99) libxml used: (2, 8, 0) libxml compiled: (2, 8, 0) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26) lxml.html.tests.test_autolink.test_suite ... ok lxml.html.tests.test_basic.test_suite ... ok test_allow_tags (lxml.html.tests.test_clean.CleanerTest) ... ok test_clean_invalid_root_tag (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_excluded (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_included (lxml.html.tests.test_clean.CleanerTest) ... ok lxml.html.tests.test_clean.test_suite ... ok lxml.html.tests.test_diff.test_suite ... ok test_body (lxml.html.tests.test_elementsoup.SoupParserTestCase) ... ok test_broken_attribute (lxml.html.tests.test_elementsoup.SoupParserTestCase) ... fish: Job 1, “nosetests -v --pdb --nocapture” terminated by signal SIGSEGV (Address boundary error) ``` Running these tests under gdb reveals the following partial stack trace: #0 0x00007ffff57fa08a in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #1 0x00007ffff57fa110 in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #2 0x00007ffff5776be3 in PyPyWeakref_LockObject () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #3 0x00007ffff0337399 in __pyx_f_4lxml_5etree__isProxyAliveInPypy.isra.265 () from /home/jeff/lxml/src/lxml/etree.pypy-26.so #4 0x00007ffff033fbfb in __pyx_f_4lxml_5etree_attemptDeallocation () from /home/jeff/lxml/src/lxml/etree.pypy-26.so #5 0x00007ffff033fd70 in __pyx_tp_dealloc_4lxml_5etree__Element () from /home/jeff/lxml/src/lxml/etree.pypy-26.so #6 0x00007ffff62186f9 in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #7 0x00007ffff57aec90 in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #8 0x00007ffff57fa670 in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #9 0x00007ffff5fa0da8 in ?? () from /home/jeff/debian_pypy_env/bin/libpypy-c.so #10 0x00007ffff5f89a31 in ?? ()

Hi Jeff, I will have a look, but first let me mention that lxml-cffi is much better supported (and faster) on pypy than lxml. It is unmaintained though, so I guess it is getting out of date with lxml developments. A bientôt, Armin.

In a message of Sun, 11 Oct 2015 10:04:02 +0200, Armin Rigo writes:
Has https://github.com/amauryfa/lxml/tree/lxml-cffi been renamed to https://github.com/amauryfa/lxml/tree/cffi ? The first one now is 404 on github, despite being the first google hit for lxml-cffi Laura

Hi Fijal, On Mon, Oct 12, 2015 at 9:44 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I remember I typed "lxml-cffi" in Google and clicked on the first link. Maybe the Google results moved around? Right now, it doesn't show https://github.com/amauryfa/lxml/tree/cffi anywhere in the first two pages, but it shows https://bitbucket.org/pypy/compatibility/wiki/lxml which links to it. A bientôt, Armin.

2015-10-12 9:10 GMT+02:00 Armin Rigo <arigo@tunes.org>:
It's quite possible that early versions of the repo had this branch, and I somehow deleted it; I cannot remember how. But I'm sure nothing has changed in the last months. -- Amaury Forgeot d'Arc

Hi, On Sun, Oct 11, 2015 at 12:28 AM, Jeff Doran <jdoran@lexmachina.com> wrote:
After some debugging, it seems that the PyPy-specific code with weakrefs in "proxy.pxi" is to blame. It seems to me that it would also have the same problem if it were compiled on CPython. (I understand why it is there, and indeed it is necessary to do *something* different on PyPy.) The problem is that if you start with two C structures "xmlNode" which form a small tree: XA: xmlNode with child XB XB: xmlNode You have two corresponding Python objects (actually cdef class _Element, but I think it's not important that they are Cython classes here): EA: _Element with _c_code = XA EB: _Element with _c_code = XB The reverse pointing is done differently on PyPy and on CPython. On CPython first: XA._private = (void *)EA XB._private = (void *)EB It's a plain pointer which doesn't hold a reference. The deallocation logic of _Element will reset the '_c_code._private' pointer back to NULL. On PyPy instead, there is an indirection: _private holds a reference to a weakref object. The effect is mostly the same. But the deallocation logic of _Element is subtly different as a result. Let's dig: The deallocation logic of E is: we reset E._c_code._private to NULL, and then if all X's in the tree have _private "set to NULL", then delete the whole tree. The problem is that "set to NULL" is more subtle in the weakref version. It really means "contains a weakref to a dead object". But weakrefs can die *before* the deallocator for their target is called. This is possible in both PyPy and CPython. So what occurs here: * we forget both EA and EB at the same time (for CPython, it can occur if there are in a cycle). * both weakrefs die * we call the deallocator of EA: it thinks the whole tree is dead because all weakrefs are dead, and frees it * we call the deallocator of EB: it still has _c_code pointing to XB, but that is garbage and crashes. That's the problem. I don't have a fix right now :-) A bientôt, Armin.

First Armin I want to thank you for taking the time to dig into this. It's a wonderful intro for me to the PyPy dev list. Next, while I'm admitedly a noob in the PyPy lower level, I'm curious why this problem hasn't been encountered more often. It's seems that each _Element should be responsible for deallocating it's own weakref and never have that outsourced to any other _Element.. In any case, thanks again and I will await a new PyPy to continue my investigation of this new platform for our production. Please note I will be happy to test any proposed solutions as they occur, nightly or otherwise. Best - Jeff On Sun, Oct 11, 2015 at 2:52 AM, Armin Rigo <arigo@tunes.org> wrote:

Hi Jeff, Ah, I think I understand now what is going on. The special-casing for PyPy should not be needed at all, but when I removed it, I get assertion failures. Now I get the reason---yes, it is yet another bug inside cpyext, caused by yet another access pattern of "PyObject *" that is somehow unusual but should be allowed. I can try to fix it inside cpyext, and then the "if IS_PYPY" cases in proxy.pxi should be removed. (I'm unsure if it is easy to change them to "if the version of PyPy is <= 2.6.1"...) A bientôt, Armin.

Hi, On Mon, Oct 12, 2015 at 11:24 AM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
We define it as a string (in pypy/module/cpyext/include/patchlevel.h) #define PYPY_VERSION "2.6.0-alpha0"
Yes, but i don't know how to do this check in Cython. Well, it's probably easy. Armin

Hi, Just an update on this problem. I retested using the latest lxml source from here <https://github.com/lxml/lxml> ( 3.5.0b2?) and a version of PyPy-4.0 from here <https://bitbucket.org/pypy/pypy/downloads/pypy-4.0.0-src.tar.bz2> that I built for Debian 7.9. The good news is I got much farther in the lxml test suite then before, but the bad news is I still encountered my old friend SIGSEGV. This is consistently occuring during test_delslice_negative2 in lxml.tests.test_elementtree.ETreeTestCase which as I said is much further thru the suite of tests then previously. This is encouraging. For now I'm still unable to use this combination so if anyone has a suggestion on how I might proceed to utilize both PyPy and lxml I would be most appreciative. Thanks. - Jeff --------------------------------------------------------- nosetests -vv --nocapture nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] nose.selector: INFO: /home/jeff/lxml/src/lxml/etree.pypy-26.so is executable; skipped nose.selector: INFO: /home/jeff/lxml/src/lxml/objectify.pypy-26.so is executable; skipped Comparing with ElementTree 1.3.0 Comparing with cElementTree 1.3.0 TESTED VERSION: 3.5.0.beta1 Python: (major=2, minor=7, micro=10, releaselevel='final', serial=42) lxml.etree: (3, 5, 0, -99) libxml used: (2, 8, 0) libxml compiled: (2, 8, 0) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26) lxml.html.tests.test_autolink.test_suite ... ok lxml.html.tests.test_basic.test_suite ... ok test_allow_tags (lxml.html.tests.test_clean.CleanerTest) ... ok test_clean_invalid_root_tag (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_excluded (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_included (lxml.html.tests.test_clean.CleanerTest) ... ok lxml.html.tests.test_clean.test_suite ... ok lxml.html.tests.test_diff.test_suite ... ok lxml.html.tests.test_elementsoup.test_suite ... ok ... lots of successful test output removed test_delitem (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delitem_tail (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_child_tail (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_memory (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_negative1 (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_negative2 (lxml.tests.test_elementtree.ETreeTestCase) ... fish: Job 2, “nosetests -vv --nocapture ” terminated by signal SIGSEGV (Address boundary error) --------------------------------------------------------- On Mon, Oct 12, 2015 at 2:30 AM, Armin Rigo <arigo@tunes.org> wrote:

Hi Jeff, On Tue, Nov 3, 2015 at 11:57 PM, Jeff Doran <jdoran@lexmachina.com> wrote:
Just an update on this problem. I retested using the latest lxml source from here ( 3.5.0b2?)
There were a few fixes to PyPy's cpyext layer. But there was no change to lxml itself. That means it is still using a broken approach to work around a CPython C API difference. Namely, this is the weakref stuff which is done incorrectly---although I don't know how to do that correctly. For now, lxml (the non-cffi version) is not compatible with PyPy at all. A bientôt, Armin.

Armin Rigo schrieb am 04.11.2015 um 14:09:
https://github.com/lxml/lxml/tree/pypy4 I tried to simply disable the special casing in proxy.pxi, but it only leads to more crashes: """ $ pypy-4.0.0-linux64/bin/pypy test.py -vv -p TESTED VERSION: 3.5.0.beta1 Python: (major=2, minor=7, micro=10, releaselevel='final', serial=42) lxml.etree: (3, 5, 0, -99) libxml used: (2, 9, 1) libxml compiled: (2, 9, 1) libxslt used: (1, 1, 28) libxslt compiled: (1, 1, 28) RPython traceback: File "rpython_memory_gctransform_support.c", line 8320, in ll_call_destructor__funcPtr_pypy_module_cpyext_p_1 File "pypy_module_cpyext_pyobject.c", line 2012, in PyOLifeline___del__ Fatal RPython error: AssertionError """ Any hints? Stefan

Hi Stefan, On Sun, Nov 8, 2015 at 5:51 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I tried to simply disable the special casing in proxy.pxi, but it only leads to more crashes:
Yes, it is expected. The special-casing was introduced because PyPy doesn't guarantee the identity of "PyObject *" when you don't own a refcount. It's what I'm trying to fix in the branch cpyext-gc-support but it's not done yet. It's why I said I don't know how to tweak lxml to make it work on current versions of PyPy. A bientôt, Armin.

Hi Stefan, On 8 November 2015 at 09:40, Armin Rigo <arigo@tunes.org> wrote:
The PyPy branch "cpyext-gc-support-2" should work (it isn't much tested so far). We should try to run https://github.com/lxml/lxml/tree/pypy4 in it. I'm writing this mail now because I'm hitting a chain of semi-related problems and I need a break, but would welcome other people trying in my stead :-) A bientôt, Armin.

Hi Jeff, I will have a look, but first let me mention that lxml-cffi is much better supported (and faster) on pypy than lxml. It is unmaintained though, so I guess it is getting out of date with lxml developments. A bientôt, Armin.

In a message of Sun, 11 Oct 2015 10:04:02 +0200, Armin Rigo writes:
Has https://github.com/amauryfa/lxml/tree/lxml-cffi been renamed to https://github.com/amauryfa/lxml/tree/cffi ? The first one now is 404 on github, despite being the first google hit for lxml-cffi Laura

Hi Fijal, On Mon, Oct 12, 2015 at 9:44 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I remember I typed "lxml-cffi" in Google and clicked on the first link. Maybe the Google results moved around? Right now, it doesn't show https://github.com/amauryfa/lxml/tree/cffi anywhere in the first two pages, but it shows https://bitbucket.org/pypy/compatibility/wiki/lxml which links to it. A bientôt, Armin.

2015-10-12 9:10 GMT+02:00 Armin Rigo <arigo@tunes.org>:
It's quite possible that early versions of the repo had this branch, and I somehow deleted it; I cannot remember how. But I'm sure nothing has changed in the last months. -- Amaury Forgeot d'Arc

Hi, On Sun, Oct 11, 2015 at 12:28 AM, Jeff Doran <jdoran@lexmachina.com> wrote:
After some debugging, it seems that the PyPy-specific code with weakrefs in "proxy.pxi" is to blame. It seems to me that it would also have the same problem if it were compiled on CPython. (I understand why it is there, and indeed it is necessary to do *something* different on PyPy.) The problem is that if you start with two C structures "xmlNode" which form a small tree: XA: xmlNode with child XB XB: xmlNode You have two corresponding Python objects (actually cdef class _Element, but I think it's not important that they are Cython classes here): EA: _Element with _c_code = XA EB: _Element with _c_code = XB The reverse pointing is done differently on PyPy and on CPython. On CPython first: XA._private = (void *)EA XB._private = (void *)EB It's a plain pointer which doesn't hold a reference. The deallocation logic of _Element will reset the '_c_code._private' pointer back to NULL. On PyPy instead, there is an indirection: _private holds a reference to a weakref object. The effect is mostly the same. But the deallocation logic of _Element is subtly different as a result. Let's dig: The deallocation logic of E is: we reset E._c_code._private to NULL, and then if all X's in the tree have _private "set to NULL", then delete the whole tree. The problem is that "set to NULL" is more subtle in the weakref version. It really means "contains a weakref to a dead object". But weakrefs can die *before* the deallocator for their target is called. This is possible in both PyPy and CPython. So what occurs here: * we forget both EA and EB at the same time (for CPython, it can occur if there are in a cycle). * both weakrefs die * we call the deallocator of EA: it thinks the whole tree is dead because all weakrefs are dead, and frees it * we call the deallocator of EB: it still has _c_code pointing to XB, but that is garbage and crashes. That's the problem. I don't have a fix right now :-) A bientôt, Armin.

First Armin I want to thank you for taking the time to dig into this. It's a wonderful intro for me to the PyPy dev list. Next, while I'm admitedly a noob in the PyPy lower level, I'm curious why this problem hasn't been encountered more often. It's seems that each _Element should be responsible for deallocating it's own weakref and never have that outsourced to any other _Element.. In any case, thanks again and I will await a new PyPy to continue my investigation of this new platform for our production. Please note I will be happy to test any proposed solutions as they occur, nightly or otherwise. Best - Jeff On Sun, Oct 11, 2015 at 2:52 AM, Armin Rigo <arigo@tunes.org> wrote:

Hi Jeff, Ah, I think I understand now what is going on. The special-casing for PyPy should not be needed at all, but when I removed it, I get assertion failures. Now I get the reason---yes, it is yet another bug inside cpyext, caused by yet another access pattern of "PyObject *" that is somehow unusual but should be allowed. I can try to fix it inside cpyext, and then the "if IS_PYPY" cases in proxy.pxi should be removed. (I'm unsure if it is easy to change them to "if the version of PyPy is <= 2.6.1"...) A bientôt, Armin.

Hi, On Mon, Oct 12, 2015 at 11:24 AM, Amaury Forgeot d'Arc <amauryfa@gmail.com> wrote:
We define it as a string (in pypy/module/cpyext/include/patchlevel.h) #define PYPY_VERSION "2.6.0-alpha0"
Yes, but i don't know how to do this check in Cython. Well, it's probably easy. Armin

Hi, Just an update on this problem. I retested using the latest lxml source from here <https://github.com/lxml/lxml> ( 3.5.0b2?) and a version of PyPy-4.0 from here <https://bitbucket.org/pypy/pypy/downloads/pypy-4.0.0-src.tar.bz2> that I built for Debian 7.9. The good news is I got much farther in the lxml test suite then before, but the bad news is I still encountered my old friend SIGSEGV. This is consistently occuring during test_delslice_negative2 in lxml.tests.test_elementtree.ETreeTestCase which as I said is much further thru the suite of tests then previously. This is encouraging. For now I'm still unable to use this combination so if anyone has a suggestion on how I might proceed to utilize both PyPy and lxml I would be most appreciative. Thanks. - Jeff --------------------------------------------------------- nosetests -vv --nocapture nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$'] nose.selector: INFO: /home/jeff/lxml/src/lxml/etree.pypy-26.so is executable; skipped nose.selector: INFO: /home/jeff/lxml/src/lxml/objectify.pypy-26.so is executable; skipped Comparing with ElementTree 1.3.0 Comparing with cElementTree 1.3.0 TESTED VERSION: 3.5.0.beta1 Python: (major=2, minor=7, micro=10, releaselevel='final', serial=42) lxml.etree: (3, 5, 0, -99) libxml used: (2, 8, 0) libxml compiled: (2, 8, 0) libxslt used: (1, 1, 26) libxslt compiled: (1, 1, 26) lxml.html.tests.test_autolink.test_suite ... ok lxml.html.tests.test_basic.test_suite ... ok test_allow_tags (lxml.html.tests.test_clean.CleanerTest) ... ok test_clean_invalid_root_tag (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_excluded (lxml.html.tests.test_clean.CleanerTest) ... ok test_safe_attrs_included (lxml.html.tests.test_clean.CleanerTest) ... ok lxml.html.tests.test_clean.test_suite ... ok lxml.html.tests.test_diff.test_suite ... ok lxml.html.tests.test_elementsoup.test_suite ... ok ... lots of successful test output removed test_delitem (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delitem_tail (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_child_tail (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_memory (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_negative1 (lxml.tests.test_elementtree.ETreeTestCase) ... ok test_delslice_negative2 (lxml.tests.test_elementtree.ETreeTestCase) ... fish: Job 2, “nosetests -vv --nocapture ” terminated by signal SIGSEGV (Address boundary error) --------------------------------------------------------- On Mon, Oct 12, 2015 at 2:30 AM, Armin Rigo <arigo@tunes.org> wrote:

Hi Jeff, On Tue, Nov 3, 2015 at 11:57 PM, Jeff Doran <jdoran@lexmachina.com> wrote:
Just an update on this problem. I retested using the latest lxml source from here ( 3.5.0b2?)
There were a few fixes to PyPy's cpyext layer. But there was no change to lxml itself. That means it is still using a broken approach to work around a CPython C API difference. Namely, this is the weakref stuff which is done incorrectly---although I don't know how to do that correctly. For now, lxml (the non-cffi version) is not compatible with PyPy at all. A bientôt, Armin.

Armin Rigo schrieb am 04.11.2015 um 14:09:
https://github.com/lxml/lxml/tree/pypy4 I tried to simply disable the special casing in proxy.pxi, but it only leads to more crashes: """ $ pypy-4.0.0-linux64/bin/pypy test.py -vv -p TESTED VERSION: 3.5.0.beta1 Python: (major=2, minor=7, micro=10, releaselevel='final', serial=42) lxml.etree: (3, 5, 0, -99) libxml used: (2, 9, 1) libxml compiled: (2, 9, 1) libxslt used: (1, 1, 28) libxslt compiled: (1, 1, 28) RPython traceback: File "rpython_memory_gctransform_support.c", line 8320, in ll_call_destructor__funcPtr_pypy_module_cpyext_p_1 File "pypy_module_cpyext_pyobject.c", line 2012, in PyOLifeline___del__ Fatal RPython error: AssertionError """ Any hints? Stefan

Hi Stefan, On Sun, Nov 8, 2015 at 5:51 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I tried to simply disable the special casing in proxy.pxi, but it only leads to more crashes:
Yes, it is expected. The special-casing was introduced because PyPy doesn't guarantee the identity of "PyObject *" when you don't own a refcount. It's what I'm trying to fix in the branch cpyext-gc-support but it's not done yet. It's why I said I don't know how to tweak lxml to make it work on current versions of PyPy. A bientôt, Armin.

Hi Stefan, On 8 November 2015 at 09:40, Armin Rigo <arigo@tunes.org> wrote:
The PyPy branch "cpyext-gc-support-2" should work (it isn't much tested so far). We should try to run https://github.com/lxml/lxml/tree/pypy4 in it. I'm writing this mail now because I'm hitting a chain of semi-related problems and I need a break, but would welcome other people trying in my stead :-) A bientôt, Armin.
participants (6)
-
Amaury Forgeot d'Arc
-
Armin Rigo
-
Jeff Doran
-
Laura Creighton
-
Maciej Fijalkowski
-
Stefan Behnel