minidom and encoding problem

Ehab Teima ehab_teima at hotmail.com
Thu Jun 6 18:19:53 EDT 2002


martin at v.loewis.de (Martin v. Loewis) wrote in message news:<m37klcubdu.fsf at mira.informatik.hu-berlin.de>...
> ehab_teima at hotmail.com (Ehab Teima) writes:
> 
> > I'm using Python 2.1. I wrote classes to create xml document from
> > scratch. The code worked fine until I hit an encoding problem. The
> > classes can read text and insert it as is to xml document using
> > creatTextNode. This text had characters > 127, and I got this error.
> 
> This is a bug in your code. You must not insert (byte) string in a DOM
> tree; always use Unicode objects.

I do not have control over the sent text. The issue started when some
bullets were copied from a word document and pasted into a file and
the whole file was passed to my classes. I cound not find a way to
convert this text to UTF-8 or anything else. Is there a way to prevent
this from happening?
> 
> > I know it's not possible to add an enconding attribute using writexml,
> > so the generated document only has <?xml version="1.0"?>. Is there any
> > way to get around this problem. 
> 
> Yes. Use Unicode strings when creating text nodes. When producing the
> serialized document through .toxml, you will find that it produces a
> Unicode string. Since (as you notice) the document has no encoding
> declaration, you need to .encode("UTF-8") that string before saving it
> into a file.

I tried to encode the string using different encodings but I could
not. Here is what I got when I tried .encode("UTF-8"):

UnicodeError: ASCII decoding error: ordinal not in range(128)



More information about the Python-list mailing list