[XML-SIG] bug in <minidom document>.writexml() missing "UnicodeError decoding error"

Uche Ogbuji uche.ogbuji@fourthought.com
Sun, 09 Feb 2003 21:38:48 -0700

> Hi -
> =

> I found a buglet in minidom that (due to my lack of real knowledge abou=
t how character encoding works in Unicode/Xml/Python (take your pick)) ca=
used me some pain (but on the bright side, now I understand the above a b=
it more).
> =

> This may be related to a previous bug detailed:
> http://mail.python.org/pipermail/xml-sig/2001-July/005696.html
> =

> The problem is that if you create a Text node using the (v0.8/standard =
DOM) document.createTextNode(data) and supply it with a string (not unico=
de) that is not in UTF-8 (e.g. Latin-1-extended, or (if you're working on=
 windows) windows-1252, then put it into a Document write it out and then=
 try to parse it back in, you can end up with a SaxException, because the=
 default encoding is UTF-8.

This is not really a bug.  The Python DOM protocol requires either unicod=
e =

objects or strings encoded only in UTF-8.  Perhaps this needs to be bette=
r =

documented.  Anyway, the results of violating this principle are undefine=
d.  =

Usually you'll get a straightforward exception, but not necessarily so.

I know this can be confusing, but since PyXML still supports Python 1.5.2=
, =

it's about as sane as can be expected.  In 4Suite, since we've bumped the=

minimum Python version to 2.1, we can now mandate that people use proper =

Unicode objects only for XML APIs.  This is a good general practice, anyw=

-- =

Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
The open office file format  - http://www-106.ibm.com/developerworks/xml/=
4Suite Repository Features - https://www6.software.ibm.com/reg/devworks/d=
XML class warfare - http://www.adtmag.com/article.asp?id=3D6965
See you at XML Web Services One - http://www.xmlconference.com/santaclara=