[XML-SIG] bug in <minidom document>.writexml() missing
"UnicodeError decoding error"
Uche Ogbuji
uche.ogbuji@fourthought.com
Sun, 09 Feb 2003 21:38:48 -0700
> Hi -
> =
> I found a buglet in minidom that (due to my lack of real knowledge abou=
t how character encoding works in Unicode/Xml/Python (take your pick)) ca=
used me some pain (but on the bright side, now I understand the above a b=
it more).
> =
> This may be related to a previous bug detailed:
> http://mail.python.org/pipermail/xml-sig/2001-July/005696.html
> =
> The problem is that if you create a Text node using the (v0.8/standard =
DOM) document.createTextNode(data) and supply it with a string (not unico=
de) that is not in UTF-8 (e.g. Latin-1-extended, or (if you're working on=
windows) windows-1252, then put it into a Document write it out and then=
try to parse it back in, you can end up with a SaxException, because the=
default encoding is UTF-8.
This is not really a bug. The Python DOM protocol requires either unicod=
e =
objects or strings encoded only in UTF-8. Perhaps this needs to be bette=
r =
documented. Anyway, the results of violating this principle are undefine=
d. =
Usually you'll get a straightforward exception, but not necessarily so.
I know this can be confusing, but since PyXML still supports Python 1.5.2=
, =
it's about as sane as can be expected. In 4Suite, since we've bumped the=
=
minimum Python version to 2.1, we can now mandate that people use proper =
Unicode objects only for XML APIs. This is a good general practice, anyw=
ay.
-- =
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
The open office file format - http://www-106.ibm.com/developerworks/xml/=
librar
y/x-think15/
4Suite Repository Features - https://www6.software.ibm.com/reg/devworks/d=
w-x4su
ite5-i/
XML class warfare - http://www.adtmag.com/article.asp?id=3D6965
See you at XML Web Services One - http://www.xmlconference.com/santaclara=
/