Devanagari int literals [was Re: Should non-security 2.7 bugs be fixed?]
python at mrabarnett.plus.com
Mon Jul 20 00:13:48 CEST 2015
On 2015-07-19 22:16, Chris Angelico wrote:
> On Mon, Jul 20, 2015 at 5:55 AM, Tim Chase
> <python.list at tim.thechases.com> wrote:
>> On 2015-07-20 04:07, Chris Angelico wrote:
>>> The int() and float() functions accept, if I'm not mistaken,
>>> anything with Unicode category "Nd" (Number, decimal digit). In
>>> your examples, the fraction (U+215B) is No, and the Roman numerals
>>> (U+2168, U+2182) are Nl, so they're not supported. Adding support
>>> for these forms might be accepted as a feature request, but it's
>>> not a bug.
>> Ah, that makes sense. Some simple testing (thanks, unicodedata
>> module) supports your conjecture.
>> It's not a particularly big deal so not really worth the brain-cycles
>> to add support for them. Just upon hearing "Python's int() does
>> smart things with Unicode characters", those were some of my first
>> characters to try. The failure struck me as odd until you explained
>> the simple difference.
> The other part of the problem is: What should float("2⅛3") be? Should
> it be equal to 21.0/83.0? Should the first part be parsed as a classic
> mixed number (2 + 1/8), and then what should the 3 mean? While it's
> easy to see what an individual character should represent (just check
> unicodedata.numeric(ch) - for ⅛ it's 0.125), the true meaning of a
> string of such characters is less than clear. Similarly, Roman
> numerals aren't meant to be used after the decimal point, so "Ⅸ.Ⅴ"
> does not normally mean nine and a half... not to mention the confusing
> situation that "ⅠⅤ" would naively parse as 15 but "Ⅳ" is definitely 4.
> Since these kinds of complexities exist, it's safest to reserve this
> level of parsing for a special-purpose function. If someone can come
> up with a really strong argument for the float() and int()
> constructors interpreting these, I'd expect to see it deployed as a
> third-party module first, before being pointed out as "see, you can
> use float() for all these, but if you want to use those, you should
> use Float() instead". (Incidentally, I fully expect to see, some day,
> pytz.localize() semantics brought into the standard library
> datetime.datetime class, for precisely this reason.)
> Unicode is awesome, but it's not a panacea :)
What's the result of, say, float('1e.3')?
It raises an exception.
So float("2⅛3") should also raise an exception.
More information about the Python-list