
Stefan Behnel <stefan_ml <at> behnel.de> writes:
Thaman chand, 10.11.2011 20:35:
I have been trying to figure out the constantly the same error in my xml parser. I have following configuration and dealing with file(200MB-4GB) size:
- Python 2.7 - lxml 2.3.1
The problem I understand is there is mismatch in the XML Syntax (start and end). The file is too huge I can't look inside at particular line between * 751969:438466*.
The error you get is in line 751969, not in line 438466. 438466 is the column number - it's a *really* long line, with lots of text encoded binary content.
You may be running into libxml2's default security limit for large text content (to prevent stuff the "billion laughs attack"). You can disable it with the "huge_tree" parser option.
http://lxml.de/parsing.html#parser-options
Stefan _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml <at> lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
I am still haunted by the same error lxml.etree.XMLSyntaxError: Extra content at the end of the document. I set libxml2 huge_tree=True parser option but not working. Validator.py ------------ from lxml import etree hugetree = etree.XMLParser(huge_tree=True) schema = etree.XMLSchema(file='mzML1.1.0.xsd') try: parser = etree.iterparse(open(r'D:\files\example.xml'), schema=schema,huge_tree=hugetree) for elementuple in parser: print elementuple except etree.XMLSyntaxError, e: print e.position print e.lineno print e.error_log raise Error ----- file:///D:/files/example.xml:751969:438466:FATAL:PARSER:ERR_DOCUMENT_END: Extra content at the end of the document Traceback (most recent call last): File "validator.py", line 8, in <module> for aTuple in parser: File "iterparse.pxi", line 478, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98432) File "iterparse.pxi", line 530, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:98953) File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74696) lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 751969, column 438466