[Python-Dev] unicode/string asymmetries
Jack Jansen
jack@oratrix.nl
Wed, 09 Jan 2002 15:55:11 +0100
> jack wrote:
> > > struct.pack("32s", wu(u"VS_VERSION_INFO"))
> >
> > Why would you have to specify the encoding if what you want is the normal,
> > standard encoding?
>
> because there is no such thing as a "normal, standard
> encoding" for a unicode character, just like there's no
> "normal, standard encoding" for an integer (big endian,
> little endian?), a floating point number (ieee, vax, etc),
> a screen coordinate, etc.
What I here call the "normal, standard encoding" is what the C library
supports. Your analogy of integers and floats is exactly the right one: even
though there are many ways to represent an integer what you get back from
PyArg_Parse("l") is a standard C "long".
Maybe the confusion is that whereever I have said "unicode" in the past I
should have said "wchar_t". I know there are, in theory, many encodings of
Unicode but in practice there is only one that I'm interested in most of the
time and that's wchar_t, because that's what all my APIs want.
So, I would like PyArg_Parse/Py_BuildValue formats that are symmetric to "s",
"s#" and "z" but that return wchar_t strings and that work with both
UnicodeObjects and StringObjects.
--
- Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- Emma Goldman -