[Python-Dev] Python and the Unicode Character Database
Steven D'Aprano
steve at pearwood.info
Tue Nov 30 14:23:22 CET 2010
Stephen J. Turnbull wrote:
> Lennart Regebro writes:
>
> > *I* think it is more important. In python 3, you can never ever assume
> > anything is ASCII any more.
>
> Sure you can. In Python program text, all keywords will be ASCII
> (English, even, though it may be en_NL.UTF-8<wink>) for the forseeable
> future.
>
> I see no reason not to make a similar promise for numeric literals. I
> see no good reason to allow compatibility full-width Japanese "ASCII"
> numerals or Arabic cursive numerals in "for i in range(...)" for
> example.
I agree with you that numeric *literals* should be restricted to the
ASCII digits. I don't think anyone here is arguing differently -- if
they are, they should speak up and try to make the case for allowing
numeric literals in arbitrary scripts. Python doesn't currently allow
non-ASCII numeric literals, and even if such a change were desirable, it
would run up against the moratorium. So let's just forget the specter of
code like:
x = math.sqrt(١٢٣٤.٥٦ ** 一.一)
It ain't gonna happen :)
But I think there is a good case for allowing the constructors int,
float and complex to continue to accept numeric *strings* with non-ASCII
digits. The code already exists, there's probably people out there who
rely on it, and in the absence of any convincing demonstration that the
existing behaviour is causing widespread difficulty, we should leave
well-enough alone.
Various people have suggested that there should be a function in the
locale module that handles numeric string input in non-ASCII digits.
This is a de facto admission that there are use-cases for taking user
input like the string '٣' and turning it into the int 3. Python can
already do this, and has been able to for many years:
[steve at sylar ~]$ python2.4
Python 2.4.6 (#1, Mar 30 2009, 10:08:01)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> int(u'٣')
3
It seems to me that there's no need to move this functionality into locale.
--
Steven
More information about the Python-Dev
mailing list