[XML-SIG] Content is split into two
Timothy Wu
2huggie at gmail.com
Wed Mar 26 08:12:28 CET 2008
Hi, I post the following in the Python mailing list but no one responded. So
I'm posting here again.
------------
Hi,
I have created a very, very simple parser for an XML.
class FindGoXML2(ContentHandler):
def characters(self, content):
print content
I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).
The XML is publicly available here:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml
I show a few line embedded in this XML:
<Gene-commentary_source>
<Other-source>
<Other-source_src>
<Dbtag>
<Dbtag_db>GO</Dbtag_db>
<Dbtag_tag>
<Object-id>
<Object-id_id>3824</Object-id_id>
</Object-id>
</Dbtag_tag>
</Dbtag>
</Other-source_src>
<Other-source_anchor>catalytic
activity</Other-source_anchor>
<Other-source_post-text>evidence:
IEA</Other-source_post-text>
</Other-source>
</Gene-commentary_source>
Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.
-------------------------
catalytic activity ==> this is the print out the line before
e
vidence: IEA
-------------------------
I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.
Any explanations??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080326/5e67a967/attachment.htm
More information about the XML-SIG
mailing list