I am puzzled about the following error message, which I got when I ran a familiar script on my new iMac with python3.4 and lxml 3.42.
Traceback (most recent call last):
File "/Users/martinmueller/Dropbox/PycharmProjects/emd/emdFeb2015.py", line 99, in <module>
tree = etree.parse(filename, parser)
File "lxml.etree.pyx", line 3301, in lxml.etree.parse (src/lxml/lxml.etree.c:72453)
File "parser.pxi", line 1791, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:105915)
File "parser.pxi", line 1817, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:106214)
File "parser.pxi", line 1721, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:105213)
File "parser.pxi", line 1122, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:100163)
File "parser.pxi", line 580, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:94286)
File "parser.pxi", line 690, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:95722)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:94789)
lxml.etree.XMLSyntaxError: Memory allocation failed, line 13323, column 18
This error occurs after running about 70 texts, each of them between 2 and 4 MB in length. The error is not a function of anything in the text that fails, because the text is processed perfectly when processed separately. IN watching memory allocation for different processes on the Mac Activity monitor, there isn't anything unusual about the memory currently used by Python or Pycharm, which I use.
It would seem from this diagnosis that somehow memory is used up cumulatively in lxml and crosses some threshold after a while. Is it related to an earlier problem where the underlying libxml stores all xml:ids in batch operations? But that led to a noticeable slowdown in operations, whereas here the processing time for each text seems a stable and linear function of its length, until suddenly it collapses.
I'll be grateful for any advice.