SAX parsing problem

David M. Cooke cookedm+news at physics.mcmaster.ca
Wed Mar 16 02:32:26 EST 2005


anon <anon at anon.net> writes:

> So I've encountered a strange behavior that I'm hoping someone can fill
> me in on.  i've written a simple handler that works with one small
> exception, when the parser encounters a line with '&' in it, it
> only returns the portion that follows the occurence.  
>
> For example, parsing a file with the line :
> <key>mykey</key><value>some%20&%20value</value>
>
> results in getting "%20value" back from the characters method, rather
> than "some%20&%20value".
>
> After looking into this a bit, I found that SAX supports entities and
> that it is probably believing the & to be an entity and processing
> it in some way that i'm unware of.  I'm using the default
> EntityResolver.

Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca



More information about the Python-list mailing list