[Python-ideas] float('∞')=float('inf')

Stephen J. Turnbull stephen at xemacs.org
Sun Jul 14 10:10:39 CEST 2013


Chris Angelico writes:

 > That one would be more plausible, in the same way that many of the
 > other Unicode digits are accepted.

The analogy doesn't hold.  Unicode *digit* and Unicode *numeric* are
separate properties.

Digits are intended to form numerals according to a positional rule,
so parsing a string of digits in logical order always means the same
thing, regardless of the character set (or Unicode block, if you
prefer).

Numeric characters are characters that have a numeric interpretation.
So in Japanese "1x1" can mean 11, 101, 1001, 10001, 100000001, and a
few others depending on the numeric character used for x (which is the
multiplier for the "1" on its left), or it might be a parse error
(conventions for writing checks often use powers of 10000 as
separators rather than multipliers, so you're missing three digits on
the right side).  It's possible the same conventions apply to Chinese.
Anyway, in Japanese many numeric characters make no sense in
positional notation, and require localized parsing methods.

Personally, I think it was a mistake to allow non-ASCII digits to be
parsed directly by int() and float().  Not even language nationalists
like the French, Russians, and Japanese publish statistics using
non-ASCII digits.  OTOH, people who need to read numbers out of text
or whatever probably should be using localization facilities anyway
(there are a few cases of "confusables" among the digits where digits
whose glyphs are similar have different values as digits).  But that
ship has sailed, apparently.




More information about the Python-ideas mailing list