[Python-ideas] [Python-Dev] Unicode minus sign in numeric conversions

Guido van Rossum guido at python.org
Sun Jun 9 00:52:41 CEST 2013


Apologies, Python 3 does actually have limited support for the other
Unicode digits (actually only the ones marked "Decimal" IIUC). I'd
totally forgotten about that (since I still live primarily in an ASCII
world :-). E.g.

>>> a = '\uff14\uff12'
>>> int(a)
42
>>>

Still I'd like to understand your use case better. Is there a
character property to identify the minus sign? How many are there? A
little investigation reveals a lot of minus signs just in the basic
plane:

>>> import unicodedata
>>> for i in range(2**16):
...  c = chr(i)
...  if 'MINUS' in unicodedata.name(c, ''): print(i, c, unicodedata.name(c, ''))
...
45 - HYPHEN-MINUS
177 ± PLUS-MINUS SIGN
727 ˗ MODIFIER LETTER MINUS SIGN
800 ̠ COMBINING MINUS SIGN BELOW
8274 ⁒ COMMERCIAL MINUS SIGN
8315 ⁻ SUPERSCRIPT MINUS
8331 ₋ SUBSCRIPT MINUS
8722 − MINUS SIGN
8723 ∓ MINUS-OR-PLUS SIGN
8726 ∖ SET MINUS
8760 ∸ DOT MINUS
8770 ≂ MINUS TILDE
8854 ⊖ CIRCLED MINUS
8863 ⊟ SQUARED MINUS
10070 ❖ BLACK DIAMOND MINUS WHITE X
10134 ➖ HEAVY MINUS SIGN
10556 ⤼ TOP ARC CLOCKWISE ARROW WITH MINUS
10793 ⨩ MINUS SIGN WITH COMMA ABOVE
10794 ⨪ MINUS SIGN WITH DOT BELOW
10795 ⨫ MINUS SIGN WITH FALLING DOTS
10796 ⨬ MINUS SIGN WITH RISING DOTS
10810 ⨺ MINUS SIGN IN TRIANGLE
10817 ⩁ UNION WITH MINUS SIGN
10860 ⩬ SIMILAR MINUS SIMILAR
65123 ﹣ SMALL HYPHEN-MINUS
65293 - FULLWIDTH HYPHEN-MINUS
>>>

There are also a lot of plus signs (including INVISIBLE PLUS :-).
Again, maybe the Unicode consortium has a standard we could implement?

--Guido

On Sat, Jun 8, 2013 at 3:30 PM, Guido van Rossum <guido at python.org> wrote:
> [Diverting to python-ideas, since this isn't as clear-cut as you think.]
>
> Why exactly is that expected behavior? What's the use case? (Surely
> you don't have a keyboard that generates \u2212 when you hit the minus
> key? :-)
>
> Is there a Unicode standard for parsing numbers? IIRC there are a
> variety of other things marked as "digits" in the Unicode standard --
> do we do anything with those? If we do anything we should be
> consistent. For now, I think we *are* consistent -- we only support
> the ASCII representation of numbers. (And that's the only
> representation we generate as output as well -- think about symmetry
> too.)
>
> This page scares me: http://en.wikipedia.org/wiki/Numerals_in_Unicode
>
> --Guido
>
> On Sat, Jun 8, 2013 at 2:49 PM, Łukasz Langa <lukasz at langa.pl> wrote:
>> Expected behaviour:
>>>>> float('\N{MINUS SIGN}12.34')
>> -12.34
>>
>>
>> Current behaviour:
>> Traceback (most recent call last):
>> ...
>> ValueError: could not convert string to float: '−12.34'
>>
>>
>> Please note: '\N{MINUS SIGN}' == '\u2212'
>>
>> --
>> Best regards,
>> Łukasz Langa
>>
>> WWW: http://lukasz.langa.pl/
>> Twitter: @llanga
>> IRC: ambv on #python-dev
>>
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev at python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)



-- 
--Guido van Rossum (python.org/~guido)


More information about the Python-ideas mailing list