[Python-Dev] Unicode entities in XML cause problems :-(

Michael Gilfix mgilfix@eecs.tufts.edu
Sat, 27 Apr 2002 15:47:58 -0400


  I came across this myself before I joined the list. My general rule
was to always convert unicode to strings with something like: "%s"
%unicode (I don't remember if I avoided str because it also returned
unicode) for any internal use. I think the context was I wanted to use
a type attribute from an xml tag to instantiate an object whose class
I retrieved from a dict. So I had something like:

  module = self.record['module']
  if not resources.__dict__.has_key (module):
     raise RuntimeError, "Attempted to retrieve data from non-existant resource module: %s" %module
  code = resources.__dict__[module]
  obj = apply (code, [ ], self.record['args'])

  ... where module was a unicode string. This was an example where
unicode sorta transparently pissed me off because it behaved just like
a string in so many ways but wasn't.

                 -- Mike

On Sat, Apr 27 @ 21:30, Matthias Urlichs wrote:
> Playing around with xml.dom.minidom, I noticed that this beast is
> perfectly able to read HTML which it can't print:
> 
> >>> import xml.dom.minidom as md
> >>> d=md.parseString("<foo>b&#2000;</foo>"))
> >>> d.writexml(sys.stdout)
> ...
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> 
> Ouch.

-- 
Michael Gilfix
mgilfix@eecs.tufts.edu

For my gpg public key:
http://www.eecs.tufts.edu/~mgilfix/contact.html