Re: [lxml] debugging seg fault in lxml.html.fromstring in pypy
![](https://secure.gravatar.com/avatar/54d44258067e7eb3077f82bb76b8927c.jpg?s=120&d=mm&r=g)
Some more observations on my problem: 1. My Seg Faults consistently arise from accessing the lxml.form.fields which is a FieldsDict. Even just printing the form.fields values without setting any new values will cause Seg Faults. 2. A FieldsDict inherits from collections.abc.MutableMapping. 3. Abstract Base Classes use weak references and FieldsDict has one thing in it, a weak reference. 4. I'm getting memory free'd twice errors which is what one might expect from a weak reference. 5. The memory managment and garbage collection in pypy is different from Cython and does not use reference counting. I suspect an lxml memory bug is lurking and I wish I could be more precise. - Jeff
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Jeff Doran schrieb am 07.10.2015 um 01:44:
Some more observations on my problem: 1. My Seg Faults consistently arise from accessing the lxml.form.fields which is a FieldsDict. Even just printing the form.fields values without setting any new values will cause Seg Faults. 2. A FieldsDict inherits from collections.abc.MutableMapping. 3. Abstract Base Classes use weak references and FieldsDict has one thing in it, a weak reference. 4. I'm getting memory free'd twice errors which is what one might expect from a weak reference. 5. The memory managment and garbage collection in pypy is different from Cython and does not use reference counting.
I suspect an lxml memory bug is lurking and I wish I could be more precise.
Generally speaking, it's way more likely that the bug is in pypy's cpyext implementation. But given the amount of work-arounds for pypy bugs in both Cython and lxml, it might be possible to add yet another one, once it's clear where to place it. I'll try to find some time for taking a closer look into your investigations this weekend. Thanks for digging into this. Stefan
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Stefan Behnel schrieb am 09.10.2015 um 15:03:
Jeff Doran schrieb am 07.10.2015 um 01:44:
Some more observations on my problem: 1. My Seg Faults consistently arise from accessing the lxml.form.fields which is a FieldsDict. Even just printing the form.fields values without setting any new values will cause Seg Faults. 2. A FieldsDict inherits from collections.abc.MutableMapping. 3. Abstract Base Classes use weak references and FieldsDict has one thing in it, a weak reference. 4. I'm getting memory free'd twice errors which is what one might expect from a weak reference. 5. The memory managment and garbage collection in pypy is different from Cython and does not use reference counting.
I suspect an lxml memory bug is lurking and I wish I could be more precise.
Generally speaking, it's way more likely that the bug is in pypy's cpyext implementation. But given the amount of work-arounds for pypy bugs in both Cython and lxml, it might be possible to add yet another one, once it's clear where to place it.
I'll try to find some time for taking a closer look into your investigations this weekend. Thanks for digging into this.
I tried it. Your case 6 wasn't enough to trigger a crash for me, but your longer example did, as does the test suite. However, the outcome so far is that "something" leads to a memory corruption inside of PyPy, and that eventually leads to a crash during garbage collection, but only at a later point. Thus, it's difficult to guess what the actual problem might be. My suggestion would be to ask the PyPy devs to debug this as they have a better understanding of their code and its deficiencies. If they can fix it, fine. If they can additionally come up with a work-around that I can apply on my side, the better. Stefan
participants (2)
-
Jeff Doran
-
Stefan Behnel