
Thaman chand, 10.11.2011 20:35:
I have been trying to figure out the constantly the same error in my xml parser. I have following configuration and dealing with file(200MB-4GB) size:
- Python 2.7 - lxml 2.3.1
The problem I understand is there is mismatch in the XML Syntax (start and end). The file is too huge I can't look inside at particular line between * 751969:438466*.
The error you get is in line 751969, not in line 438466. 438466 is the column number - it's a *really* long line, with lots of text encoded binary content. You may be running into libxml2's default security limit for large text content (to prevent stuff the "billion laughs attack"). You can disable it with the "huge_tree" parser option. http://lxml.de/parsing.html#parser-options Stefan