Hi, I post the following in the Python mailing list but no one responded. So
I'm posting here again.



I have created a very, very simple parser for an XML.

class FindGoXML2(ContentHandler):
    def characters(self, content):
        print content

I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).

The XML is publicly available here:

I show a few line embedded in this XML:


Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.

catalytic activity  ==> this is the print out the line before

vidence: IEA

I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.

Any explanations??
