I'm working on http://bugs.python.org/issue8654 and I'd like to get some feedback from extension-writers, since it will impact them.
Synopsis of the problem:
If you try to load an extension module that:
then you get an ugly "undefined symbol" error from the linker.
For Python 3, __repr__ must return a Unicode object which means that almost all extensions will need to call some Unicode functions. It's basically fruitless to upload a binary egg for Python 3 to PyPi, since it will generate link errors for a large fraction of downloaders (as I discovered the hard way).
By default, extensions will compile in a "Unicode-agnostic" mode, where Py_UNICODE is an incomplete type. The extension's code can pass Py_UNICODE pointers back and forth between Python API functions, but it cannot dereference them nor use sizeof(Py_UNICODE). Unicode-agnostic modules will load and run in both UCS2 and UCS4 interpreters. Most extensions fall into this category.
If a module needs to dereference Py_UNICODE, it can define PY_REAL_PY_UNICODE before including Python.h to make Py_UNICODE a complete type, .Attempting to load such a module into a mismatched interpreter will cause an ImportError (instead of an ugly linker error). If an extension uses PY_REAL_PY_UNICODE in any .c file, it must also use it in the .c file that calls PyModule_Create to ensure the Unicode width is stored in the module's information.
I have two questions for the greater community:
Do you have any fundamental concerns with this design?
Would you prefer the default be reversed? i.e, that Py_UNICODE be a
complete type by default, and an extension must have a #define to compile in Unicode-agnostic mode? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com