[Python-Dev] Python and the Unicode Character Database

Thu Dec 2 01:17:51 CET 2010

Martin v. Löwis wrote:
>>> And here, my observation stands: if they wanted to, they currently
>>> couldn't - at least not for real numbers (and also not for integers
>>> if they want to use grouping). So the presumed application of this
>>> feature doesn't actually work, despite the presence of the feature it
>>> was supposedly meant to enable.
>> By that argument, English speakers wanting to enter integers using
>> Arabic numerals can't either!
> 
> That's correct, and the key point here for the argument. It's just not
> *meant* to support localized number forms, but deliberately constrains
> them to a formal grammar which users using it must be aware of in order
> to use it.

You're *agreeing* that English speakers can't enter integers using 
Arabic numerals? What do you think I'm doing when I do this?

 >>> int("1234")
1234

Ah wait... did you think I meant Arabic numerals in the sense of digits 
used by Arabs in Arabia? I meant Arabic numerals as opposed to Roman 
numerals. Sorry for the confusion.

Your argument was that even though Python's int() supports many 
non-ASCII digits, the lack of grouping means that it "doesn't actually 
work". If that argument were correct, then it applies equally to ASCII 
digits as well.

It's clearly nonsense to say that int("1234") "doesn't work" just 
because of the lack of grouping. It's equally nonsense to say that
int("١٢٣٤") "doesn't work" because of the lack of grouping.

[...]
> I take it that you speak in favor of the float syntax also being used
> for the float() constructor.

I'm sorry, I don't understand what you mean here. I've repeatedly said 
that the syntax for numeric literals should remain constrained to the 
ASCII digits, as it currently is.

n = ١٢٣٤

gives a SyntaxError, and I don't want to see that change.

But I've also argued that the float constructor currently accepts 
non-ASCII strings:

n = int("١٢٣٤")

we should continue to support the existing behaviour. None of the 
arguments against it seem convincing to me, particularly since the 
opponents of the current behaviour admit that there is a use-case for 
it, but they just want it to move elsewhere, such as the locale module.

We've even heard from one person -- I forget who, sorry -- who claimed 
that C++ has the same behaviour, and if you want ASCII-only digits, you 
have to explicitly ask for it.

For what it's worth, Microsoft warns developers not to assume users will 
enter numeric data using ASCII digits:

"Number representation can also use non-ASCII native digits, so your 
application may encounter characters other than 0-9 as inputs. Avoid 
filtering on U+0030 through U+0039 to prevent frustration for users who 
are trying to enter data using non-ASCII digits."

http://msdn.microsoft.com/en-us/magazine/cc163506.aspx

There was a similar discussion going on in Perl-land recently:

http://www.nntp.perl.org/group/perl.perl5.porters/2010/07/msg162400.html

although, being Perl, the discussion was dominated by concerns about 
regexes and implicit conversions, rather than an explicit call to 
float() or int() as we are discussing here.

[...]
>> In the same way, if I wanted to enter a number using non-Arabic digits,
>> it works provided I compromise by using the Anglo-American decimal point
>> instead of the European comma or the native decimal point I might prefer.
> 
> Why would you want that, if, what you really wanted, could not be
> done. There certainly *is* a way to convert strings into floats,
> and there would be a way if that restricted itself to the digits 0..9.
> So it can't be the mere desire to convert strings to float that make
> you ask for non-ASCII digits.

Why do Europeans use programming languages that force them to use a dot 
instead of a comma for the decimal place? Why do I misspell 
string.centre as string.center? Because if you want to get something 
done, you use the tools you have and not the tools you'd like to have.

-- 
Steven