Re: [Python-ideas] UCS2 vs UCS4 ABIs

Nov. 2, 2009

      On Mon, Nov 2, 2009 at 11:57 AM, Guido van Rossum <guido@python.org> wrote:
...
We'd also have to hide the macros that can be used to access the
internals of a PyUnicodeObject, in order for that approach to be safe.
Basically, an extension would have to include a second header file to
use those macros and it would have to somehow indicate to the linker
that it is using UCS2 or UCS4 internals as well.
I don't know of a portable way to indicate that to the linker simply by
including a header file.  I wish I did.

Here is one idea that will cause a linker error if there's a mismatch and
one of the macros are used.  It does cause the macro to execute an extra CPU
instruction or two, though.

In unicodeobject.h:

/* Require the macro to reference a global variable that will only be
present if the Unicode ABI matches correctly.  Arrange for the global
variable to always have the value zero, and add it to the return value of
the macro. */

#if Py_UNICODE_SIZE == 4
extern const int Py_UnicodeZero_UCS4;
#define Py_UNICODE_ZERO (Py_UnicodeZero_UCS4)
#else
extern const int Py_UnicodeZero_UCS2;
#define Py_UNICODE_ZERO (Py_UnicodeZero_UCS2)
#endif

#define PyUnicode_AS_UNICODE(op) \
        (Py_UNICODE_ZERO + (((PyUnicodeObject *)(op))->str))

In unicodeobject.c:

extern const int Py_UNICODE_ZERO = 0;
...
I would want to err on the safe side here -- if it was at all easy to
create an extension that *seems* to be ABI-neutral but *actually*
relies on knowledge about the UCS2 or UCS4 representation, we'd be
creating a worse problem. Users don't like stuff not working, but they
*really* don't like stuff crashing with random core dumps -- if it has
to be broken, let it break very loudly and explicitly. The current
approach satisfies that requirement -- it probably just errs too far
on the "never assume it might work" side.
Agreed.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>