[Python-Dev] unicode/string asymmetries

Jack Jansen jack@oratrix.nl
Wed, 09 Jan 2002 13:25:19 +0100


> thomas wrote:
> 
> > Hehe, I don't want to put objects in structures, I just want to buid
> > structures containing "Unicode strings".
> 
> there is no such thing.
> 
> what you want is a binary buffer with an *encoded*
> unicode string.

It becomes more and more clear to me that there are two groups of people on 
this list: those who understand unicode (and may or may not actually use it) 
and those who want to use unicode (but apparently don't understand it). I'm in 
the second group:-)

> to get one, figure out what encoding you need (probably
> utf-16-le), convert the string to a byte string using the
> encode method, and store that byte string in your struct.
> 
> def wu(str):
>     # encode unicode string for win32 apis
>     return str.encode("utf-16-le")
> 
> struct.pack("32s", wu(u"VS_VERSION_INFO"))

Why would you have to specify the encoding if what you want is the normal, 
standard encoding? Or, to rephrase the question, why do C programmers only 
have to s/char/wchar_t/, add a "w" to the front of the routine names and a u 
in front of the string constants, whereas Python programmers are now suddenly 
expected to learn all this mumbo-jumbo about encodings and such?
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -