minidom and encoding problem
Fredrik Lundh
fredrik at pythonware.com
Thu Jun 6 18:57:31 EDT 2002
Ehab Teima wrote:
> > This is a bug in your code. You must not insert (byte) string in a DOM
> > tree; always use Unicode objects.
>
> I do not have control over the sent text. The issue started when some
> bullets were copied from a word document and pasted into a file and
> the whole file was passed to my classes.
if you don't know what encoding the file is using, what
makes you think Python can figure it out?
> I tried to encode the string using different encodings but I could
> not.
the string is already encoded. you need to *decode* it.
> Here is what I got when I tried .encode("UTF-8"):
> UnicodeError: ASCII decoding error: ordinal not in range(128)
this means that you have non-ASCII characters in an
ASCII string. to convert this to a unicode string, use
u = s.decode(encoding)
where "encoding" is the source encoding (if you haven't
the slightest idea, try "iso-8859-1")
also see:
http://effbot.org/guides/unicode-objects.htm
</F>
More information about the Python-list
mailing list