Determining Unicode encoding.
Sean
sean at activeprime.com
Tue Apr 29 16:42:40 EDT 2003
I'm really new to dealing with unicode, so please bear with me. I'm
trying to add unicode support to a program I'm working on, and I'm
getting stuck a little when printing a unicode string to a file. I
know I have to encode the string using an encoding (UTF-8, UTF-16,
latin-1, etc). The problem is that I don't know how to determine what
the *right* encoding to use on a particular string is. The way I
understand it, utf-8 will handle any unicode data, but it will
translate characters not in the standard ASCII set to fit within the
8-bit character table. My problem is I'm handling data from a lot of
different encodings (latin, eastern, asian, etc) and I can't allow
data in the strings to be changed. I also can't (at least I don't
know how to) determine what encodings the strings are using. IE, I
don't know what strings are from what languages. Is there any way to
determine, from the unicode string itself, what encoding I need to use
to prevent data loss? Or do I need to find a way to determine
beforehand what encoding they are using when they are read in?
Am I even asking the right questions? I'm really pretty lost and my
O'Reilly books arn't helping very much.
-Sean
More information about the Python-list
mailing list