[Tutor] Python - XML: How to write UNICODE to a file ?? (when using LATIN-1 Chars)

Javier JJ python.tutorial at jarava.org
Tue Aug 26 01:18:38 EDT 2003


Hi all!!

I'm getting my feet wet with Python + XML processing and I've run into
something that's stopping me:

I have an xml file with ASCII characters (no encoding is specified; it
contains characters valid in the latin-1 charset - it's a log file
generated by MSN Messenger 6.0).

I am processing it with mindom as follows:

doc = minidom.parse(log_file.xml)
rootNode = doc.childNodes[1]

Now, I can do all sorts of manipulation on the nodes w/o any problem

But afterwards I want to write the result back to disk as XML, so I do:

>>> out = open("salida.txt", "wb")
>>> listado = rootNode.toxml()

Now "listado" has a xml-looking unicode string; it looks quite fine to
me... but when I try to write it to disk, I get:

>>> out.write(listado)

Traceback (most recent call last):
  File "<pyshell#34>", line 1, in -toplevel-
    out.write(listado)
UnicodeEncodeError: 'ascii' codec can't encode character '\ued' in
position 2274: ordinal not in range(128)

The "offending" character is:

>>> listado[2274]
u'\xed'
>>> print listado[2274]
í

In the original XML file (after all, I'm writing back the same thing I'm
reading) the char appears as follows:

</To><Text Style="font-family:Comic Sans MS; color:#000080; ">bien
aquí</Text></Message>

If I cut&paste the text from IDLE into UltraEdit and then save it, and
try to view the result, the XSL bombs on the same character:

I've tried using both IDLE (python2.3 on cygwin) and PythonWin 2.2.2
(ActiveState) and both complain....

I _know_ that there has to be a way to be able to write back the XML to
the file, but I can't figure it out.

Any suggestions?

    Thanks a lot!

        Javier J







More information about the Tutor mailing list