[Python-Dev] Python and the Unicode Character Database

Tue Nov 30 15:18:13 CET 2010

On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano <steve at pearwood.info> wrote:
..
> But you should be able to write:
>
> text = input("Enter a number using your preferred digits: ")
> num = float(text)
>
> without caring whether the user enters 一.一 or 1.1 or something else.
>

I find it ironic that people who argue for preservation of the current
behavior do it without checking what it actually is:

>>> float('一.一')
..
UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' ..

This one of the biggest problems with this feature.  It does not fit
user's expectations.  Even the original author of the decimal "codec"
expected the above to work. [1]

> Python can already do this, and has been able to for many years:
> >>> int(u'٣')
> 3

but you can do this without support from int() as well:

>>> import unicodedata
>>> unicodedata.digit('٣')
3

and for Unihan numbers, you can do
>>> unicodedata.numeric('一')
1.0

and

>>> unicodedata.numeric('ⅷ')
8.0

and if you are so inclined,

>>> [unicodedata.numeric(c) for c in "ↂ ↁ ⅗ ⅞ 𐄳".split()]
[10000.0, 5000.0, 0.6, 0.875, 90000.0]

Do you want to see all these supported by float()?

[1] "makeunicodedata.py does not support Unihan digit data"
http://bugs.python.org/issue10575