Python API version & optional features
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Martin has uploaded a patch which modifies the Python API level number depending on the setting of the compile time option for internal Unicode width (UCS-2/UCS-4): https://sourceforge.net/tracker/?func=detail&aid=445717&group_id=5470&atid=305470 I am not sure whether this is the right way to approach this problem, though, since it affects all extensions -- not only ones using Unicode. If at all possible, I'd prefer some other means to handle this situation (extension developers are certainly not going to start shipping binaries for narrow and wide Python versions if their extension does not happen to use Unicode). Any ideas ? Thanks, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/0887d/0887d92e8620e0d2e36267115257e0acf53206d2" alt=""
M.-A. Lemburg writes:
Given that unicodeobject.h defines many macros and size-sensitive types in the public API, I don't see any way around this. If the API always used UCS4 (including in the macros), or defined both UCS2 and UCS4 versions of everything affected, then we could get around it. That seems like a high price to pay. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
"Fred L. Drake, Jr." wrote:
I think Guido suggested using macros to turn the Unicode APIs into e.g. PyUnicodeUCS4_Encode() vs. PyUnicodeUCS2_Encode(). That would prevent loading of non-compatible extensions using Unicode APIs (it doesn't catch the argument parser usage, though, e.g. "u"). Perhaps that's the way to go ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
Hm, the "u" argument parser is a nasty one to catch. How likely is this to be the *only* reference to Unicode in a particular extension? I'm trying to convince myself that the magic number patch is okay, and here's what I come up with. If someone builds a Python with a non-standard Unicode width and accidentally uses a directory full of extensions built for the standard Unicode width on his platform, he deserves a warning. Since most extensions come with source anyway, people who want to experiment with UCS4 will have to be more adventurous and build all the extensions they need from source. The warnings will remind them. If there's a particular extension that they can only get in binary *and* that extension doesn't use Unicode, they can train themselves to ignore that warning. These warnings should use the warnings framework, by the way, to make it easier to ignore a specific warning. Currently it's a hard write to stderr. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Guido van Rossum wrote:
It is not very likely but IMHO possible for e.g. extensions which rely on the fact that wchar_t == Py_UNICODE and then do direct interfacing to some other third party code. I guess one could argue that extension writers should check for narrow/wide builds in their extensions before using Unicode. Since the number of Unicode extension writers is much smaller than the number of users, I think that this apporach would be reasonable, provided that we document the problem clearly in the NEWS file.
Hmm, that would probably not make UCS-4 builds very popular ;-)
Using the warnings framework would indeed be a good idea (many older extensions work just fine even with later API levels; the warnings are annoying, though) ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
OK. I approve.
Hmm, that would probably not make UCS-4 builds very popular ;-)
Do you have any reason to assume that it would be popular otherwise? :-) :-) :-)
Exactly. I'm not going to make the change, but it should be a two-liner in Python/modsupport.c:Py_InitModule4(). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Guido van Rossum wrote:
Great ! I'll go ahead and fix unicodeobject.h.
Oh, I do hope that people try out the UCS-4 builds. They may not be all that interesting yet, but I believe that for Asian users they do have some advantages.
I'll look into this as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/bb914/bb914b6cebade552b26dde66c6fa6845bf2b639d" alt=""
On Mon, Jul 30, 2001 at 10:27:32AM -0400, Guido van Rossum wrote:
Shouldn't Python automatically Do The Right Thing in that case? That would mean wrapping the UCS4 calls in a conversion layer, which isn't hard. I mean, it's just a question of adding or taking away zeroes, right? :) Simon
data:image/s3,"s3://crabby-images/0887d/0887d92e8620e0d2e36267115257e0acf53206d2" alt=""
M.-A. Lemburg writes:
Given that unicodeobject.h defines many macros and size-sensitive types in the public API, I don't see any way around this. If the API always used UCS4 (including in the macros), or defined both UCS2 and UCS4 versions of everything affected, then we could get around it. That seems like a high price to pay. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
"Fred L. Drake, Jr." wrote:
I think Guido suggested using macros to turn the Unicode APIs into e.g. PyUnicodeUCS4_Encode() vs. PyUnicodeUCS2_Encode(). That would prevent loading of non-compatible extensions using Unicode APIs (it doesn't catch the argument parser usage, though, e.g. "u"). Perhaps that's the way to go ?! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
Hm, the "u" argument parser is a nasty one to catch. How likely is this to be the *only* reference to Unicode in a particular extension? I'm trying to convince myself that the magic number patch is okay, and here's what I come up with. If someone builds a Python with a non-standard Unicode width and accidentally uses a directory full of extensions built for the standard Unicode width on his platform, he deserves a warning. Since most extensions come with source anyway, people who want to experiment with UCS4 will have to be more adventurous and build all the extensions they need from source. The warnings will remind them. If there's a particular extension that they can only get in binary *and* that extension doesn't use Unicode, they can train themselves to ignore that warning. These warnings should use the warnings framework, by the way, to make it easier to ignore a specific warning. Currently it's a hard write to stderr. --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Guido van Rossum wrote:
It is not very likely but IMHO possible for e.g. extensions which rely on the fact that wchar_t == Py_UNICODE and then do direct interfacing to some other third party code. I guess one could argue that extension writers should check for narrow/wide builds in their extensions before using Unicode. Since the number of Unicode extension writers is much smaller than the number of users, I think that this apporach would be reasonable, provided that we document the problem clearly in the NEWS file.
Hmm, that would probably not make UCS-4 builds very popular ;-)
Using the warnings framework would indeed be a good idea (many older extensions work just fine even with later API levels; the warnings are annoying, though) ! -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/3ce65/3ce654d3e7cefe0116a594a13f4c84dce2b4ec49" alt=""
OK. I approve.
Hmm, that would probably not make UCS-4 builds very popular ;-)
Do you have any reason to assume that it would be popular otherwise? :-) :-) :-)
Exactly. I'm not going to make the change, but it should be a two-liner in Python/modsupport.c:Py_InitModule4(). --Guido van Rossum (home page: http://www.python.org/~guido/)
data:image/s3,"s3://crabby-images/addaf/addaf2247848dea3fd25184608de7f243dd54eca" alt=""
Guido van Rossum wrote:
Great ! I'll go ahead and fix unicodeobject.h.
Oh, I do hope that people try out the UCS-4 builds. They may not be all that interesting yet, but I believe that for Asian users they do have some advantages.
I'll look into this as well. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/
data:image/s3,"s3://crabby-images/bb914/bb914b6cebade552b26dde66c6fa6845bf2b639d" alt=""
On Mon, Jul 30, 2001 at 10:27:32AM -0400, Guido van Rossum wrote:
Shouldn't Python automatically Do The Right Thing in that case? That would mean wrapping the UCS4 calls in a conversion layer, which isn't hard. I mean, it's just a question of adding or taking away zeroes, right? :) Simon
participants (4)
-
Fred L. Drake, Jr.
-
Guido van Rossum
-
M.-A. Lemburg
-
Simon Cozens