memory allocation failure

I recently upgraded to Python 3.6. and lxml 3.7. In running a script on 500 files I get the following error message approximately after running about 60 files: Traceback (most recent call last): File "/Users/martin/Dropbox/PycharmProjects/earlyprint/process_eebo_sample/simple_eebochange.py", line 88, in <module> tree= etree.parse(filename, parser) File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:81110) File "src/lxml/parser.pxi", line 1811, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:117841) File "src/lxml/parser.pxi", line 1837, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:118188) File "src/lxml/parser.pxi", line 1741, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:117100) File "src/lxml/parser.pxi", line 1138, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:111646) File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105102) File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106810) File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105664) File "/users/martin/dropbox/shcdemo-shctexts-d780443e41f0/162-A04656.xml", line 317 lxml.etree.XMLSyntaxError: Memory allocation failed, line 317, column 18 I have a machine with 32 gig of memory and watching memory use on both Python ad Pycharm the memory usage figures (100 MB, 700 MB) are not out of ordinary for what I’m doing. The failure has nothing to with whatever is in the file because it will work fine if I start the run at the previous file. The failure is very sudden, unlike in other situations where a run through thousands of files slowed down to a crawl because the system kept all ids in memory and finally stopped. Pycharm has 4 gigabytes allocated to it and on the face of it the scope of this current script is well below other scripts that ran through thousands of files without problem. I’ll be grateful for any tips on what to look for and where. BTW, adding the filename of the current file to the error report is a real blessing!
participants (1)
-
Martin Mueller