Guido van Rossum wrote:
I am not sure whether this is the right way to approach this problem, though, since it affects all extensions -- not only ones using Unicode.
Given that unicodeobject.h defines many macros and size-sensitive types in the public API, I don't see any way around this. If the API always used UCS4 (including in the macros), or defined both UCS2 and UCS4 versions of everything affected, then we could get around it. That seems like a high price to pay.
I think Guido suggested using macros to turn the Unicode APIs into e.g. PyUnicodeUCS4_Encode() vs. PyUnicodeUCS2_Encode().
That would prevent loading of non-compatible extensions using Unicode APIs (it doesn't catch the argument parser usage, though, e.g. "u").
Perhaps that's the way to go ?!
Hm, the "u" argument parser is a nasty one to catch. How likely is this to be the *only* reference to Unicode in a particular extension?
It is not very likely but IMHO possible for e.g. extensions which rely on the fact that wchar_t == Py_UNICODE and then do direct interfacing to some other third party code.
I guess one could argue that extension writers should check for narrow/wide builds in their extensions before using Unicode.
Since the number of Unicode extension writers is much smaller than the number of users, I think that this apporach would be reasonable, provided that we document the problem clearly in the NEWS file.
I'm trying to convince myself that the magic number patch is okay, and here's what I come up with. If someone builds a Python with a non-standard Unicode width and accidentally uses a directory full of extensions built for the standard Unicode width on his platform, he deserves a warning. Since most extensions come with source anyway, people who want to experiment with UCS4 will have to be more adventurous and build all the extensions they need from source. The warnings will remind them. If there's a particular extension that they can only get in binary *and* that extension doesn't use Unicode, they can train themselves to ignore that warning.
Hmm, that would probably not make UCS-4 builds very popular ;-)
These warnings should use the warnings framework, by the way, to make it easier to ignore a specific warning. Currently it's a hard write to stderr.
Using the warnings framework would indeed be a good idea (many older extensions work just fine even with later API levels; the warnings are annoying, though) !