writing Unicode objects to XML
Alex Martelli
aleax at aleax.it
Mon May 5 05:25:31 EDT 2003
<posted & mailed>
Alessio Pace wrote:
> the first step of reading from XML (encoded in UTF-8) has been
> accomplished through xml.minidom.
> I get Unicode strings, and that's all right. But If I want to make Python
> modify that source xml file, how should I do? I mean, those Unicode
> objects
> of kind u'n\xe8' I need that they are stored exactly as before I read
> them, so in the UTF-8 would be just nè which is this final step to
> do this? I am getting crazy with all this XML and Unicode.. :-(
Here's a sample use:
>>> s
'<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>> x=xml.dom.minidom.parseString(s)
>>> x.toxml(encoding='iso-8859-1')
'<?xml version="1.0" encoding="iso-8859-1"?>\n<foo>n\xe8</foo>'
>>> x.toxml(encoding='utf-8')
'<?xml version="1.0" encoding="utf-8"?>\n<foo>n\xc3\xa8</foo>'
>>>
Of course, you would most often DO something to x between the
time you parse it in and the time you write it back out, but in
any case the 'encoding' is the key -- both in the xml declaration
AND as a keyword parameter to the toxml method.
Alex
More information about the Python-list
mailing list