Hi lxml-dev: I'm getting glibc/MemoryError/cStringIO crashes/exceptions from the following (minimal reproduction) code: <code> import lxml.etree wiki_xml_filename = 'enwiki-latest-pages-articles.xml' # from http://download.wikimedia.org/enwiki/latest/ context = lxml.etree.iterparse(wiki_xml_filename, events=("end")) for action, elem in context: pass </code> The crash usually occurs about halfway through the file (around <page> 3,000,000) The same code runs on smaller mediawiki xml files (200 mb) without error. I only get this error for this very large xml file (in this case about 13gb uncompressed). I had no trouble parsing the same file with the python standard library sax parser, but it is much slower and I don't like its api. I'm using libxml2-2.6.32 (also used earlier versions), python 2.5.2, python-lxml 2.0.5 (also tried earlier versions), Kubuntu 8.04 with 2.6.24 kernel (also tested on opensuse 10.3 with earlier kernel). Some of the exceptions are MemoryErrors. The machine running the code has 4gb of ram. The kernel does not appear to significantly hit the swap during the run. Here are the errors: ** glibc detected *** python: free(): invalid pointer: 0x08220a15 *** Aborted Also: Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/apport_python_hook.py", line 37, in apport_excepthook import re, tempfile, traceback File "/usr/lib/python2.5/traceback.py", line 241, in <module> def print_last(limit=None, file=None): MemoryError Original exception was: Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None ... and also (slightly different) Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/apport_python_hook.py", line 37, in apport_excepthook import re, tempfile, traceback File "/usr/lib/python2.5/tempfile.py", line 33, in <module> from random import Random as _Random MemoryError Original exception was: Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None Sometimes I just get 'Segmentation fault' from the shell, and sometimes it just hangs indefinitely. and finally (cStringIO): Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None Error in sys.excepthook: Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/apport_python_hook.py", line 36, in apport_excepthook from cStringIO import StringIO ImportError: /usr/lib/python2.5/lib-dynload/cStringIO.so: failed to map segment from shared object: Permission denied Original exception was: Traceback (most recent call last): File "minimal.py", line 6, in <module> for action, elem in context: File "iterparse.pxi", line 390, in lxml.etree.iterparse.__next__ (src/lxml/lxml.etree.c:65064) File "parser.pxi", line 489, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:47432) lxml.etree.XMLSyntaxError: None Any direction on tracking down the source is greatly appreciated! -- Marc