[Python-Dev] unicode/string asymmetries
Fredrik Lundh
fredrik@pythonware.com
Wed, 9 Jan 2002 14:16:30 +0100
jack wrote:
> > struct.pack("32s", wu(u"VS_VERSION_INFO"))
>
> Why would you have to specify the encoding if what you want is the normal,
> standard encoding?
because there is no such thing as a "normal, standard
encoding" for a unicode character, just like there's no
"normal, standard encoding" for an integer (big endian,
little endian?), a floating point number (ieee, vax, etc),
a screen coordinate, etc.
as soon as something gets too large to store in a byte,
there's always more than one obvious way to store it ;-)
> Or, to rephrase the question, why do C programmers only
> have to s/char/wchar_t/
because they're tend to prefer to quickly get the wrong
result? ;-)
C makes no guarantees about wchar_t, so Python's Unicode
type doesn't rely on it (it can use it, though: you can check
the HAVE_USABLE_WCHAR_T macro to see if it's the same
thing; see PyUnicode_FromWideChar for an example).
in the Mac case, it might be easiest to configure things so
that HAVE_USABLE_WCHAR_T is always true, and assume
that Py_UNICODE is the same thing as wchar_t. (checking
this in the module init function won't hurt, of course)
but you cannot rely on that if you're writing truly portable
code.
</F>