SAX parsing problem
gh at none.com
Wed Mar 16 09:14:56 CET 2005
In article <qnkk6o8rqph.fsf at arbutus.physics.mcmaster.ca>, David M.
Cooke <cookedm+news at physics.mcmaster.ca> wrote:
> anon <anon at anon.net> writes:
> > So I've encountered a strange behavior that I'm hoping someone can fill
> > me in on. i've written a simple handler that works with one small
> > exception, when the parser encounters a line with '&' in it, it
> > only returns the portion that follows the occurence.
> > For example, parsing a file with the line :
> > <key>mykey</key><value>some%20&%20value</value>
> > results in getting "%20value" back from the characters method, rather
> > than "some%20&%20value".
> > After looking into this a bit, I found that SAX supports entities and
> > that it is probably believing the & to be an entity and processing
> > it in some way that i'm unware of. I'm using the default
> > EntityResolver.
> Are you sure you're not actually getting three chunks: "some%20", "&",
> and "%20value"? The xml.sax.handler.ContentHandler.characters method
> (which I presume you're using for SAX, as you don't mention!) is not
> guaranteed to get all contiguous character data in one call. Also check
> if .skippedEntity() methods are firing.
Ya, skippedEntity() wasn't firing, but you are correct about receiving
three chunks. The characters handler routine is fired 3 times for a
single text block. Why does it do this? Is there a way to prevent
More information about the Python-list