[XML-SIG] bug in <minidom document>.writexml() missing
"UnicodeError decoding error"
Sun, 09 Feb 2003 21:38:48 -0700
> Hi -
> I found a buglet in minidom that (due to my lack of real knowledge abou=
t how character encoding works in Unicode/Xml/Python (take your pick)) ca=
used me some pain (but on the bright side, now I understand the above a b=
> This may be related to a previous bug detailed:
> The problem is that if you create a Text node using the (v0.8/standard =
DOM) document.createTextNode(data) and supply it with a string (not unico=
de) that is not in UTF-8 (e.g. Latin-1-extended, or (if you're working on=
windows) windows-1252, then put it into a Document write it out and then=
try to parse it back in, you can end up with a SaxException, because the=
default encoding is UTF-8.
This is not really a bug. The Python DOM protocol requires either unicod=
objects or strings encoded only in UTF-8. Perhaps this needs to be bette=
documented. Anyway, the results of violating this principle are undefine=
Usually you'll get a straightforward exception, but not necessarily so.
I know this can be confusing, but since PyXML still supports Python 1.5.2=
it's about as sane as can be expected. In 4Suite, since we've bumped the=
minimum Python version to 2.1, we can now mandate that people use proper =
Unicode objects only for XML APIs. This is a good general practice, anyw=
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
The open office file format - http://www-106.ibm.com/developerworks/xml/=
4Suite Repository Features - https://www6.software.ibm.com/reg/devworks/d=
XML class warfare - http://www.adtmag.com/article.asp?id=3D6965
See you at XML Web Services One - http://www.xmlconference.com/santaclara=