[lxml-dev] another iterparse segfault
This one mystifies me competely -- three line testcase attached. This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as Python 2.4 on Gentoo with lxml trunk as of yesterday.
Hi Andrew, Andrew Lutomirski wrote:
This one mystifies me competely -- three line testcase attached.
This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as Python 2.4 on Gentoo with lxml trunk as of yesterday.
Again, thanks for the bug report. This one really is a bug and I can reproduce it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi) that is used internally to read the data in small chunks and hand it to the parser to generate events. If you reduce the value, the chunk size is passed earlier (after less than the 10000 elements you needed for your test) and the bug occurs after a smaller number of parsed elements. I'll have to take a closer look at it to figure out what's going wrong here. Thanks again, Stefan
Stefan Behnel wrote:
Andrew Lutomirski wrote:
This one mystifies me competely -- three line testcase attached.
This crashes on lxml 1.1alpha static (python 2.4) on Windows as well as Python 2.4 on Gentoo with lxml trunk as of yesterday.
Again, thanks for the bug report. This one really is a bug and I can reproduce it with your test. It is related to the __ITERPARSE_CHUNK_SIZE (iterparse.pxi) that is used internally to read the data in small chunks and hand it to the parser to generate events. If you reduce the value, the chunk size is passed earlier (after less than the 10000 elements you needed for your test) and the bug occurs after a smaller number of parsed elements.
I'll have to take a closer look at it to figure out what's going wrong here.
... and so I did. It was a bug in the iterparse.next() method. The events and corresponding elements are stored in a 'queue' (a Python list) and retrieved by a call to PyList_GET_ITEM(). That funtion (or macro) returns a so-called "borrowed reference" that must be INCREF'd by hand (Pyrex does not know about it). Otherwise, the refcount is too low and will be garbage collected before the last reference is gone. Here's the patch. Thanks for the report, Stefan
participants (2)
-
Andrew Lutomirski -
Stefan Behnel