
Hi dev, I have a problem which I couldn't figure out after through quest. I have been trying to figure out the constantly the same error in my xml parser. I have following configuration and dealing with file(200MB-4GB) size: - Python 2.7 - lxml 2.3.1 The problem I understand is there is mismatch in the XML Syntax (start and end). The file is too huge I can't look inside at particular line between * 751969:438466*. But I tried what a naive do using sed command (sed -n 751969p filename) for specific line. So here are specific line output * 751969* <http://pastebin.com/nkBxnxZS> and *438466*<http://pastebin.com/NurZtPME>. It clearly shows that element start and end is not matching. But, the problem is I can't open such a huge file and do editing manually. Note: I have design the *validator* <http://pastebin.com/C9JnPz85>according to given schema and it shows the same problem. Question ------------ - **So how can I get rid off from such a error while parsing in future?** - **Or get get rid off from such an element which is not mandatory in parser?** Error goes here C:\Documents and Settings\****\Desktop>python example.py (751969, 438466) None file:///D:/files/average.mzML:751969:438466:FATAL:PARSER:ERR_DOCUMENT_END: Extra content at the end of the document Traceback (most recent call last): File "MainPaser.py", line 330, in <module> main() File "MainPaser.py", line 322, in main fast_iter(context, process_element) File "MainPaser.py", line 24, in fast_iter for event, elem in context: File "iterparse.pxi", line 478, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98432) File "iterparse.pxi", line 530, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:98953) File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74696) lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 751969, column 438466 Help me!! I posted same question in a *stack overflow*<http://stackoverflow.com/questions/8082456/lxml-etree-xmlsyantaxerror>