XML SAX parser bug?
uche.ogbuji at gmail.com
uche.ogbuji at gmail.com
Tue Feb 7 12:34:49 EST 2006
mitsura at skynet.be wrote:
> Fredrik Lundh schreef:
> > mitsura at skynet.be wrote:
> > > I think I ran into a bug in the XML SAX parser.
> > >
> > > part of my program consist of reading a rather large XML file (about
> > > 10Mb) containing a few thousand elements.
> > > I have the following problem. Sometimes that SAX parses misreads a
> > > line.
> >
> > it's not a bug; the parser is free to split up character runs (due to buffering,
> > entities or character references, etc). it's up to you to merge character runs
> > into strings.
>
> but how do I detect that the parser has split up the characters? I gues
> I need to detect it in order to reconstruct the complete string
Here's a recipe:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881
Using this filter you can then write SAX code that assumes normalized
text events. Also, 4Suite's SAX implementation, Saxlette,
automatically does this text event merging for you at C speed:
http://4suite.org/docs/CoreManual.xml#saxlette
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://fourthought.com
http://copia.ogbuji.net http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/
More information about the Python-list
mailing list