
Alexander Belopolsky wrote:
Two recently reported issues brought into light the fact that Python language definition is closely tied to character properties maintained by the Unicode Consortium. [1,2] For example, when Python switches to Unicode 6.0.0 (planned for the upcoming 3.2 release), we will gain two additional characters that Python can use in identifiers. [3] [...]
Why do you consider this a problem? It would be a problem if previously valid identifiers *stopped* being valid, but not the other way around.
Of course, the likelihood is low that this change will affect any user, but the change in str.isspace() reported in [1] is likely to cause some trouble:
Looking at the thread here: http://bugs.python.org/issue10567 I interpret it as indicting that Python's isspace() has been buggy for many years, and is only now being fixed. It's always unfortunate when people rely on bugs, but I'm not sure we should be promising to support bug-for-bug compatibility from one version to the next :)
While we have little choice but to follow UCD in defining str.isidentifier(), I think Python can promise users more stability in what it treats as space or as a digit in its builtins. For example, I don't think that supporting
float('١٢٣٤.٥٦') 1234.56
is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII.
Seems like a pretty foolish assumption, if you ask me, pretty much akin to assuming that if string.isalpha() returns true that string is ASCII. Support for non-Arabic numerals in number strings goes back to at least Python 2.4: [steve@sylar ~]$ python2.4 Python 2.4.6 (#1, Mar 30 2009, 10:08:01) [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
float(u'١٢٣٤.٥٦') 1234.5599999999999
The fact that this is (apparently) only being raised now means that it isn't actually a problem in real life. I'd even say that it's a feature, and that if Python didn't support non-Arabic numerals, it should. -- Steven