[Python-Dev] The C API and wide unicode support
Walter Dörwald
walter@livinglogic.de
Wed, 10 Jul 2002 16:57:16 +0200
Michael Hudson wrote:
> It may be best to allow this particular dead horse to go on being
> dead, but I thought I'd ask here. Beats work, anyway.
>
> Picture the situation: you're wrapping a C library that returns a
> unicode string (let's say encoded as UCS-2). You want to return this
> as a Python object. So you'd think you can write
>
> return PyUnicode_Decode(encstr, "ucs-2", NULL);
There is no "ucs-2" encoding. This should be "utf-16", "utf-16-le"
or "utf-16-be".
> (or something close to that). But for reasons that escape me,
> PyUnicode_Decode is included in the API renaming in
> Include/unicodeobject.h, so if you want to provide binaries you have
> to provide two, and you can be sure that users will have no idea which
> they need.
>
> So, questions:
>
> (1) am I correct in thinking that PyUnicode_Decode (and a bunch of
> others) could safely be omitted from the renaming?
No, because the unicode objects generated will consist of either
UCS-2 or UCS-4 "characters". This has nothing to do with the
encoding of the byte array which you use to create the unicode object.
Any C function that uses Unicode objects in any way needs name
mangling, because the storage layout of the Unicode objects
changes.
> (2) if so, is it worth omitting those APIs that could be omitted for 2.3?
>
> This train of thinking came about because the version of 2.2 that
> comes with Redhat 7.3 is compiled with wide unicode support (which
> surprised me), and so the pygame RPMs broke.
I don't know, probably because sizeof(wchar_t)==4 ?
Bye,
Walter Dörwald