[Python-Dev] unicode/string asymmetries

Thomas Heller thomas.heller@ion-tof.com
Wed, 9 Jan 2002 08:51:15 +0100


> > I would like to create struct's containing unicode characters
> > (be gentle with me, maybe I mean wide characters, or mbcs, but I'm really
> > not sure)
> 
> Well, that is precisely the problem: When putting a Unicode object
> into a C structure, there are too many alternatives to pick a sensible
> default. It is not even clear what a "wide character" is: it mide be a
> value of wchar_t, or it might be a value of Py_UNICODE (those differ
> on Unix, in the default installation). 
> 
> For "MBCS", the most reasonable default might be "utf-8", since this
> capable of encoding all characters. On Windows, "mbcs" is also a good
> choice, since it uses the encoding that all character API uses.
> 
> Why are you asking? Do you have a specific implementation in mind, or
> are you just worried that Unicode objects cannot be put into
> structures? Don't worry, file objects cannot be put into structures,
> either :-)
Hehe, I don't want to put objects in structures, I just want to buid
structures containing "Unicode strings".

Actually, in this case I'm trying to build a win32 VS_VERSIONINFO
structure, which contains a field WCHAR szKey[].
MSDN says: 
<quote>
  szKey 
  Contains the Unicode string "VS_VERSION_INFO".
</quote>

Currently I use something like the following code to access the
raw buffer:

  struct.pack("32s", str(buffer(u"VS_VERSION_INFO")))

Looks strange but works:

>>> print repr(struct.pack("32s", (str(buffer(u"VS_VERSION_INFO")))))
'V\x00S\x00_\x00V\x00E\x00R\x00S\x00I\x00O\x00N\x00_\x00I\x00N\x00F\x00O\x00\x00\x00'

Thomas