[Python-Dev] Unicode entities in XML cause problems :-(

Paul Prescod paul@prescod.net
Sat, 27 Apr 2002 13:32:58 -0700


Matthias Urlichs wrote:
> 
> Playing around with xml.dom.minidom, I noticed that this beast is
> perfectly able to read HTML which it can't print:
> 
> >>> import xml.dom.minidom as md
> >>> d=md.parseString("<foo>b&#2000;</foo>"))
> >>> d.writexml(sys.stdout)
> ...
> UnicodeError: ASCII encoding error: ordinal not in range(128)

"sys.stdout" doesn't know what to do with Unicode. Wrap it in an encoder
(usually UTF-8) using the codecs module.

I agree that this is a usability problem but it isn't a bug and I think
you've mischaracterized the source of the problem.

 Paul Prescod