[XML-SIG] Content is split into two

Wed Mar 26 08:12:28 CET 2008

Hi, I post the following in the Python mailing list but no one responded. So
I'm posting here again.

------------

Hi,

I have created a very, very simple parser for an XML.

class FindGoXML2(ContentHandler):
    def characters(self, content):
        print content

I have made it simple because I want to debug. This prints out any content
enclosed by tags (right?).

The XML is publicly available here:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=9622&retmode=xml

I show a few line embedded in this XML:

              <Gene-commentary_source>
                <Other-source>
                  <Other-source_src>
                    <Dbtag>
                      <Dbtag_db>GO</Dbtag_db>
                      <Dbtag_tag>
                        <Object-id>
                          <Object-id_id>3824</Object-id_id>
                        </Object-id>
                      </Dbtag_tag>
                    </Dbtag>
                  </Other-source_src>
                  <Other-source_anchor>catalytic
activity</Other-source_anchor>
                  <Other-source_post-text>evidence:
IEA</Other-source_post-text>
                </Other-source>
              </Gene-commentary_source>

Notice the third line before the last. I expect my content printout to print
out "evidence:IEA".
However this is what I get.

-------------------------
catalytic activity  ==> this is the print out the line before

e
vidence: IEA
-------------------------

I don't understand why a few blank lines were printed after "catalytic
activity". But that
doesn't matter. What matters is where the string "evidence: IEA" is split
into two printouts.
First it prints only "e", then "vidence: IEA". I parsed 825 such XMLs
without a problem,
this occurs on my 826th XML.

Any explanations??
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/xml-sig/attachments/20080326/5e67a967/attachment.htm