minidom and encoding problem
Martin v. Loewis
martin at v.loewis.de
Thu Jun 6 03:14:37 EDT 2002
ehab_teima at hotmail.com (Ehab Teima) writes:
> I'm using Python 2.1. I wrote classes to create xml document from
> scratch. The code worked fine until I hit an encoding problem. The
> classes can read text and insert it as is to xml document using
> creatTextNode. This text had characters > 127, and I got this error.
This is a bug in your code. You must not insert (byte) string in a DOM
tree; always use Unicode objects.
> I know it's not possible to add an enconding attribute using writexml,
> so the generated document only has <?xml version="1.0"?>. Is there any
> way to get around this problem.
Yes. Use Unicode strings when creating text nodes. When producing the
serialized document through .toxml, you will find that it produces a
Unicode string. Since (as you notice) the document has no encoding
declaration, you need to .encode("UTF-8") that string before saving it
into a file.
> Does any body know how to get the rootnode of a document? If I know
> the root node, I can add the proper header and then write the root
> node using writexml.
The document element is available through .documentElement on the
Document.
Regards,
Martin
More information about the Python-list
mailing list