[capi-sig] Unicode compatibility

Daniel Stutzbach daniel at stutzbachenterprises.com
Fri May 21 16:34:25 CEST 2010


I'm working on http://bugs.python.org/issue8654 and I'd like to get some
feedback from extension-writers, since it will impact them.

Synopsis of the problem:

If you try to load an extension module that:
- uses any of Python's Unicode functions, and
- was compiled by a Python with the opposite Unicode setting (UCS2 vs UCS4)
then you get an ugly "undefined symbol" error from the linker.

For Python 3, __repr__ must return a Unicode object which means that almost
all extensions will need to call some Unicode functions.  It's basically
fruitless to upload a binary egg for Python 3 to PyPi, since it will
generate link errors for a large fraction of downloaders (as I discovered
the hard way).

Proposed solution:

By default, extensions will compile in a "Unicode-agnostic" mode, where
Py_UNICODE is an incomplete type. The extension's code can pass Py_UNICODE
pointers back and forth between Python API functions, but it cannot
dereference them nor use sizeof(Py_UNICODE).  Unicode-agnostic modules will
load and run in both UCS2 and UCS4 interpreters.  Most extensions fall into
this category.

If a module needs to dereference Py_UNICODE, it can define
PY_REAL_PY_UNICODE before including Python.h to make Py_UNICODE a complete
type, .Attempting to load such a module into a mismatched interpreter will
cause an ImportError (instead of an ugly linker error).  If an extension
uses PY_REAL_PY_UNICODE in any .c file, it must also use it in the .c file
that calls PyModule_Create to ensure the Unicode width is stored in the
module's information.

I have two questions for the greater community:

1) Do you have any fundamental concerns with this design?

2) Would you prefer the default be reversed?  i.e, that Py_UNICODE be a
complete type by default, and an extension must have a #define to compile in
Unicode-agnostic mode?
--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>


More information about the capi-sig mailing list