pyexpat and unicode

mallum breakfast at 10.am
Mon Dec 17 17:10:00 EST 2001


Nope. This still breaks, with the same error;

import xml.parsers.expat
parser = xml.parsers.expat.ParserCreate(encoding='utf8')

data_uni = u"<?xml version='1.0' encoding='UTF-8'?><hello>\202</hello>"
data_uni.encode('utf8')
parser.Parse(data_uni)

Is this a Bug ?

   -- mallum

on Mon, Dec 17, 2001 at 07:10:22PM +0000, python-list-admin at python.org wrote:
> mallum wrote:
>         ...
> > data_uni = u"<?xml version='1.0' encoding='UTF-8' ?><hello>\202</hello>"
> > data     = "<?xml version='1.0' encoding='UTF-8' ?><hello>there</hello>"
> > 
> > data_uni.encode('utf8')
> > 
> > parser.Parse(data)
> > parser.Parse(data_uni)
>         ...
> > Does this mean Im unable to pass utf8 encoded strings to pyexpat ?
> > According to the docs it should. Can anyone spread some light on this.
> 
> You can't, I believe, pass SOME strings with a certain encoding followed in 
> the same parse by others with different encodings; or, as in this case, 
> ones not in fact encoded (remember the call to .encode returns an encoded 
> string, which you ignore -- it doesn't change data_uni, of course, as it's 
> immutable, like all strings).
> 
> Separate parses work fine:
> 
> import xml.parsers.expat
> parser = xml.parsers.expat.ParserCreate(encoding='utf8')
> 
> data_uni = u"<?xml version='1.0' encoding='UTF-8' ?><hello>\202</hello>"
> data     = "<?xml version='1.0' encoding='UTF-8' ?><hello>there</hello>"
> 
> denc = data_uni.encode('utf8')
> 
> for thedata in data_uni, data, denc:
>     parser = xml.parsers.expat.ParserCreate(encoding='utf8')
>     print 'parsing', repr(thedata)
>     parser.Parse(data, 1)
>     print 'done'
> 
> 
> Alex
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list




More information about the Python-list mailing list