[Python-Dev] unicode/string asymmetries
Thomas Heller
thomas.heller@ion-tof.com
Wed, 9 Jan 2002 08:51:15 +0100
> > I would like to create struct's containing unicode characters
> > (be gentle with me, maybe I mean wide characters, or mbcs, but I'm really
> > not sure)
>
> Well, that is precisely the problem: When putting a Unicode object
> into a C structure, there are too many alternatives to pick a sensible
> default. It is not even clear what a "wide character" is: it mide be a
> value of wchar_t, or it might be a value of Py_UNICODE (those differ
> on Unix, in the default installation).
>
> For "MBCS", the most reasonable default might be "utf-8", since this
> capable of encoding all characters. On Windows, "mbcs" is also a good
> choice, since it uses the encoding that all character API uses.
>
> Why are you asking? Do you have a specific implementation in mind, or
> are you just worried that Unicode objects cannot be put into
> structures? Don't worry, file objects cannot be put into structures,
> either :-)
Hehe, I don't want to put objects in structures, I just want to buid
structures containing "Unicode strings".
Actually, in this case I'm trying to build a win32 VS_VERSIONINFO
structure, which contains a field WCHAR szKey[].
MSDN says:
<quote>
szKey
Contains the Unicode string "VS_VERSION_INFO".
</quote>
Currently I use something like the following code to access the
raw buffer:
struct.pack("32s", str(buffer(u"VS_VERSION_INFO")))
Looks strange but works:
>>> print repr(struct.pack("32s", (str(buffer(u"VS_VERSION_INFO")))))
'V\x00S\x00_\x00V\x00E\x00R\x00S\x00I\x00O\x00N\x00_\x00I\x00N\x00F\x00O\x00\x00\x00'
Thomas