On Sun, Nov 28, 2010 at 6:43 PM, Steven D'Aprano email@example.com wrote: ..
is more important than to assure users that once their program accepted some text as a number, they can assume that the text is ASCII.
Seems like a pretty foolish assumption, if you ask me, pretty much akin to assuming that if string.isalpha() returns true that string is ASCII.
It is not to 99.9% of Python users whose code is written for 2.x. Their strings are byte strings and string.isdigit() does imply ASCII even if string.isalpha() does not in many locales.
The fact that this is (apparently) only being raised now means that it isn't actually a problem in real life. I'd even say that it's a feature, and that if Python didn't support non-Arabic numerals, it should.
I raised this problem because I found a bug that is related to this feature. The bug is also a regression from 2.x.
.. ValueError: invalid literal for float(): 1234?
The last character is lost, but the error message is still meaningful.
In 3.x, however:
While investigating this issue I found that by the time the string gets to the number parser (_Py_dg_strtod), all non-ascii characters are dropped by PyUnicode_EncodeDecimal() so it cannot produce meaningful diagnostic.
Of course, PyUnicode_EncodeDecimal(), can be fixed by making it pass non-ascii chars through as UTF-8 bytes, but I was wondering if preserving the ability to parse exotic numerals was worth the effort.