XML SAX parser bug?

Fredrik Lundh fredrik at pythonware.com
Thu Jan 19 13:23:45 EST 2006


mitsura at skynet.be wrote:

> I think I ran into a bug in the XML SAX parser.
>
> part of my program consist of reading a rather large XML file (about
> 10Mb) containing a few thousand elements.
> I have the following problem. Sometimes that SAX parses misreads a
> line.
> Let me explain: the XML file contains a few thousand lines like this:
> "
> <TargetRef>WINOSSPI:Storage@@n91c90a.cmc.com</TargetRef>
> "
> where 'n91c90a.cmc.com' is the name of a system and thus changes per
> system.
> I a few cases, the SAX parser misreads the line. The parser sometimes
> plits characters the line in:
> "WINOSSPI:Storage@@n" and "91c90a.cmc.com".
> I put a 'print characters' line in the 'characters' method of the
> parser that is how I found out.
> It only happens for a few of the thousand lines but you can imagine
> that is very annoying.
>
> I checked for errors in the XML file but the file seems ok.
>
> Is this a bug or am I doing something wrong?

it's not a bug; the parser is free to split up character runs (due to buffering,
entities or character references, etc).  it's up to you to merge character runs
into strings.

</F>






More information about the Python-list mailing list