[XML-SIG] Content is split into two

Timothy Wu 2huggie at gmail.com
Wed Mar 26 08:12:28 CET 2008

Hi, I post the following in the Python mailing list but no one responded. So
I'm posting here again.



I have created a very, very simple parser for an XML.

class FindGoXML2(ContentHandler):
    def characters(self, content):
        print content

I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).

The XML is publicly available here:

I show a few line embedded in this XML:


Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.

catalytic activity  ==> this is the print out the line before

vidence: IEA

I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.

Any explanations??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080326/5e67a967/attachment.htm 

More information about the XML-SIG mailing list