Greg Ewing wrote:
It so happened that the Unicode support was written to make it very easy to change the compile-time code unit size
What about extension modules that deal with Unicode strings? Will they have to be recompiled too? If so, is there anything to detect an attempt to import an extension module with an incompatible Unicode character width?
That's a good question !
The answer is: yes, extensions which use Unicode will have to be recompiled for narrow and wide builds of Python. The question is however, how to detect cases where the user imports an extension built for narrow Python into a wide build and vice versa.
The standard way of looking at the API level won't help. We'd need some form of introspection API at the C level... hmm, perhaps looking at the sys module will do the trick for us ?!
In any case, this is certainly going to cause trouble one of these days...
Here are some alternative ways to deal with this: (1) Use the preprocessor to rename all the Unicode APIs to get "Wide" appended to their name in wide mode. This makes any use of a Unicode API in an extension compiled for the wrong Py_UNICODE_SIZE fail with a link-time error. (Which should cause an ImportError for shared libraries.) (2) Ditto but only rename the PyModule_Init function. This is much less work but more coarse: a module that doesn't use any Unicode APIs (and I expect these will be a large majority) still would not be accepted. (3) Change the interpretation of PYTHON_API_VERSION so that a low bit of '1' means wide Unicode. Then you only get a warning (followed by a core dump when actually trying to use Unicode). I mentioned (1) and (3) in an earlier post. --Guido van Rossum (home page: http://www.python.org/~guido/)