[XML-SIG] bug in <minidom document>.writexml() missing "UnicodeError decoding error"

Tim Diggins tim@red56.co.uk
Mon, 10 Feb 2003 00:21:43 -0000


Hi -

I found a buglet in minidom that (due to my lack of real knowledge about =
how character encoding works in Unicode/Xml/Python (take your pick)) =
caused me some pain (but on the bright side, now I understand the above =
a bit more).

This may be related to a previous bug detailed:
http://mail.python.org/pipermail/xml-sig/2001-July/005696.html

The problem is that if you create a Text node using the (v0.8/standard =
DOM) document.createTextNode(data) and supply it with a string (not =
unicode) that is not in UTF-8 (e.g. Latin-1-extended, or (if you're =
working on windows) windows-1252, then put it into a Document write it =
out and then try to parse it back in, you can end up with a =
SaxException, because the default encoding is UTF-8.

The methods .toprettyxml() and .toxml() don't have this problem, nor =
does xml.dom.ext.PrettyPrint() etc. which all give (some variant of) =
"UnicodeError: UTF-8 decoding error: unsupported Unicode code range".

>>> xd=3Dminidom.parseString("<foo/>") #*
<xml.dom.minidom.Document instance at 0x017F7228>
>>> xd=3Dminidom.parseString("<foo/>")
>>> tn=3Dxd.createTextNode('\xfa The \x91a')
>>> xd.documentElement.appendChild(tn)
<DOM Text node "=C3=83=C2=BA The =C3=A2=E2=82=AC=CB=9Ca">
>>> xd.writexml(sys.stdout)
<?xml version=3D"1.0" ?>
<thung>=C3=83=C2=BA The =C3=A2=E2=82=AC=CB=9Ca</thung>


thanks

Tim


---------------------------
  Tim Diggins
  mailto:tim@red56.co.uk
  http://www.red56.co.uk/people/tim
  mobile: 07976 583856  =20