[XML-SIG] Content is split into two
J. Cliff Dyer
jcd at unc.edu
Wed Mar 26 14:39:21 CET 2008
On Wed, 2008-03-26 at 15:12 +0800, Timothy Wu wrote:
> Hi, I post the following in the Python mailing list but no one
> responded. So I'm posting here again.
>
> ------------
>
> Hi,
>
> I have created a very, very simple parser for an XML.
>
> class FindGoXML2(ContentHandler):
> def characters(self, content):
> print content
>
> I have made it simple because I want to debug. This prints out any
> content enclosed by tags (right?).
>
> The XML is publicly available here:
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml
>
> I show a few line embedded in this XML:
>
> <Gene-commentary_source>
> <Other-source>
> <Other-source_src>
> <Dbtag>
> <Dbtag_db>GO</Dbtag_db>
> <Dbtag_tag>
> <Object-id>
> <Object-id_id>3824</Object-id
> _id>
> </Object-id>
> </Dbtag_tag>
> </Dbtag>
> </Other-source_src>
> <Other-source_anchor>catalytic
> activity</Other-source_anchor>
> <Other-source_post-text>evidence:
> IEA</Other-source_post-text>
> </Other-source>
> </Gene-commentary_source>
>
> Notice the third line before the last. I expect my content printout to
> print out "evidence:IEA".
> However this is what I get.
>
> -------------------------
> catalytic activity ==> this is the print out the line before
>
>
>
> e
> vidence: IEA
> -------------------------
>
> I don't understand why a few blank lines were printed after "catalytic
> activity". But that
> doesn't matter. What matters is where the string "evidence: IEA" is
> split into two printouts.
> First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
> without a problem,
> this occurs on my 826th XML.
>
> Any explanations??
The parser will retrieve input in chunks of unspecified size. There is
no guarantee that a text block will all get returned at once. You are
seeing this problem because the print statement adds a newline after it
prints. If you want to see the text itself, without phantom newlines,
try replacing print with sys.stdout.write().
Cheers,
Cliff
More information about the XML-SIG
mailing list