[Python-Dev] Python and the Unicode Character Database
steve at pearwood.info
Thu Dec 2 01:17:51 CET 2010
Martin v. Löwis wrote:
>>> And here, my observation stands: if they wanted to, they currently
>>> couldn't - at least not for real numbers (and also not for integers
>>> if they want to use grouping). So the presumed application of this
>>> feature doesn't actually work, despite the presence of the feature it
>>> was supposedly meant to enable.
>> By that argument, English speakers wanting to enter integers using
>> Arabic numerals can't either!
> That's correct, and the key point here for the argument. It's just not
> *meant* to support localized number forms, but deliberately constrains
> them to a formal grammar which users using it must be aware of in order
> to use it.
You're *agreeing* that English speakers can't enter integers using
Arabic numerals? What do you think I'm doing when I do this?
Ah wait... did you think I meant Arabic numerals in the sense of digits
used by Arabs in Arabia? I meant Arabic numerals as opposed to Roman
numerals. Sorry for the confusion.
Your argument was that even though Python's int() supports many
non-ASCII digits, the lack of grouping means that it "doesn't actually
work". If that argument were correct, then it applies equally to ASCII
digits as well.
It's clearly nonsense to say that int("1234") "doesn't work" just
because of the lack of grouping. It's equally nonsense to say that
int("١٢٣٤") "doesn't work" because of the lack of grouping.
> I take it that you speak in favor of the float syntax also being used
> for the float() constructor.
I'm sorry, I don't understand what you mean here. I've repeatedly said
that the syntax for numeric literals should remain constrained to the
ASCII digits, as it currently is.
n = ١٢٣٤
gives a SyntaxError, and I don't want to see that change.
But I've also argued that the float constructor currently accepts
n = int("١٢٣٤")
we should continue to support the existing behaviour. None of the
arguments against it seem convincing to me, particularly since the
opponents of the current behaviour admit that there is a use-case for
it, but they just want it to move elsewhere, such as the locale module.
We've even heard from one person -- I forget who, sorry -- who claimed
that C++ has the same behaviour, and if you want ASCII-only digits, you
have to explicitly ask for it.
For what it's worth, Microsoft warns developers not to assume users will
enter numeric data using ASCII digits:
"Number representation can also use non-ASCII native digits, so your
application may encounter characters other than 0-9 as inputs. Avoid
filtering on U+0030 through U+0039 to prevent frustration for users who
are trying to enter data using non-ASCII digits."
There was a similar discussion going on in Perl-land recently:
although, being Perl, the discussion was dominated by concerns about
regexes and implicit conversions, rather than an explicit call to
float() or int() as we are discussing here.
>> In the same way, if I wanted to enter a number using non-Arabic digits,
>> it works provided I compromise by using the Anglo-American decimal point
>> instead of the European comma or the native decimal point I might prefer.
> Why would you want that, if, what you really wanted, could not be
> done. There certainly *is* a way to convert strings into floats,
> and there would be a way if that restricted itself to the digits 0..9.
> So it can't be the mere desire to convert strings to float that make
> you ask for non-ASCII digits.
Why do Europeans use programming languages that force them to use a dot
instead of a comma for the decimal place? Why do I misspell
string.centre as string.center? Because if you want to get something
done, you use the tools you have and not the tools you'd like to have.
More information about the Python-Dev