writing Unicode objects to XML

Alex Martelli aleax at aleax.it
Mon May 5 05:25:31 EDT 2003


<posted & mailed>

Alessio Pace wrote:

> the first step of reading from XML (encoded in UTF-8) has been
> accomplished through xml.minidom.
> I get Unicode strings, and that's all right. But If I want to make Python
> modify that source xml file, how should I do? I mean, those Unicode
> objects
> of kind u'n\xe8'  I need that they are stored exactly as before I read
> them, so in the UTF-8 would be just n&#xe8;  which is this final step to
> do this? I am getting crazy with all this XML and Unicode.. :-(

Here's a sample use:

>>> s
'<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>> x=xml.dom.minidom.parseString(s)
>>> x.toxml(encoding='iso-8859-1')
'<?xml version="1.0" encoding="iso-8859-1"?>\n<foo>n\xe8</foo>'
>>> x.toxml(encoding='utf-8')
'<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>>

Of course, you would most often DO something to x between the
time you parse it in and the time you write it back out, but in
any case the 'encoding' is the key -- both in the xml declaration
AND as a keyword parameter to the toxml method.


Alex





More information about the Python-list mailing list