SAX parsing problem

gh gh at none.com
Wed Mar 16 09:14:56 CET 2005


In article <qnkk6o8rqph.fsf at arbutus.physics.mcmaster.ca>, David M.
Cooke <cookedm+news at physics.mcmaster.ca> wrote:

> anon <anon at anon.net> writes:
> 
> > So I've encountered a strange behavior that I'm hoping someone can fill
> > me in on.  i've written a simple handler that works with one small
> > exception, when the parser encounters a line with '&#38;' in it, it
> > only returns the portion that follows the occurence.  
> >
> > For example, parsing a file with the line :
> > <key>mykey</key><value>some%20&#38;%20value</value>
> >
> > results in getting "%20value" back from the characters method, rather
> > than "some%20&#38;%20value".
> >
> > After looking into this a bit, I found that SAX supports entities and
> > that it is probably believing the &#38; to be an entity and processing
> > it in some way that i'm unware of.  I'm using the default
> > EntityResolver.
> 
> Are you sure you're not actually getting three chunks: "some%20", "&",
> and "%20value"? The xml.sax.handler.ContentHandler.characters method
> (which I presume you're using for SAX, as you don't mention!) is not
> guaranteed to get all contiguous character data in one call. Also check
> if .skippedEntity() methods are firing.

Ya,  skippedEntity() wasn't firing, but you are correct about receiving
three chunks.  The characters handler routine is fired 3 times for a
single text block.  Why does it do this?  Is there a way to prevent
doing this? 

Much thanks.

gh



More information about the Python-list mailing list