Hi dev,

I have a problem which I couldn't figure out after through quest.

I have been trying to figure out the constantly the same error in my xml parser. I have following configuration and dealing with file(200MB-4GB) size:
 
 - Python 2.7
 - lxml 2.3.1

The problem I understand is there is mismatch in the XML Syntax (start and end).  The file is too huge I can't look inside at particular line between 751969:438466. But I tried what a naive do using sed command (sed -n 751969p filename) for specific line. So here are specific line output 751969 and 438466. It clearly shows that element start and end is not matching. But, the problem is I can't open such a huge file and do editing manually.

Note: I have design the validator according to given schema and it shows the same problem.

Question
------------

 - **So how can I get rid off from such a error while parsing in future?**

 - **Or get get rid off from such an element which is not mandatory in parser?**

Error goes here


    C:\Documents and Settings\****\Desktop>python example.py
        (751969, 438466)
        None
        file:///D:/files/average.mzML:751969:438466:FATAL:PARSER:ERR_DOCUMENT_END: Extra content at the end of the document
        Traceback (most recent call last):
          File "MainPaser.py", line 330, in <module>
            main()
          File "MainPaser.py", line 322, in main
            fast_iter(context, process_element)
          File "MainPaser.py", line 24, in fast_iter
            for event, elem in context:
          File "iterparse.pxi", line 478, in lxml.etree.iterparse.__next__ (src/lxml\lxml.etree.c:98432)
          File "iterparse.pxi", line 530, in lxml.etree.iterparse._read_more_events (src/lxml\lxml.etree.c:98953)
          File "parser.pxi", line 590, in lxml.etree._raiseParseError (src/lxml\lxml.etree.c:74696)
        lxml.etree.XMLSyntaxError: Extra content at the end of the document, line 751969, column 438466

Help me!!


I posted same question in a stack overflow