[Python-Dev] Unicode entities in XML cause problems :-(
Michael Gilfix
mgilfix@eecs.tufts.edu
Sat, 27 Apr 2002 15:47:58 -0400
I came across this myself before I joined the list. My general rule
was to always convert unicode to strings with something like: "%s"
%unicode (I don't remember if I avoided str because it also returned
unicode) for any internal use. I think the context was I wanted to use
a type attribute from an xml tag to instantiate an object whose class
I retrieved from a dict. So I had something like:
module = self.record['module']
if not resources.__dict__.has_key (module):
raise RuntimeError, "Attempted to retrieve data from non-existant resource module: %s" %module
code = resources.__dict__[module]
obj = apply (code, [ ], self.record['args'])
... where module was a unicode string. This was an example where
unicode sorta transparently pissed me off because it behaved just like
a string in so many ways but wasn't.
-- Mike
On Sat, Apr 27 @ 21:30, Matthias Urlichs wrote:
> Playing around with xml.dom.minidom, I noticed that this beast is
> perfectly able to read HTML which it can't print:
>
> >>> import xml.dom.minidom as md
> >>> d=md.parseString("<foo>bߐ</foo>"))
> >>> d.writexml(sys.stdout)
> ...
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> Ouch.
--
Michael Gilfix
mgilfix@eecs.tufts.edu
For my gpg public key:
http://www.eecs.tufts.edu/~mgilfix/contact.html