[Python-Dev] Python and the Unicode Character Database

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Nov 30 15:18:13 CET 2010


On Tue, Nov 30, 2010 at 7:59 AM, Steven D'Aprano <steve at pearwood.info> wrote:
..
> But you should be able to write:
>
> text = input("Enter a number using your preferred digits: ")
> num = float(text)
>
> without caring whether the user enters 一.一 or 1.1 or something else.
>

I find it ironic that people who argue for preservation of the current
behavior do it without checking what it actually is:

>>> float('一.一')
..
UnicodeEncodeError: 'decimal' codec can't encode character '\u4e00' ..

This one of the biggest problems with this feature.  It does not fit
user's expectations.  Even the original author of the decimal "codec"
expected the above to work. [1]

> Python can already do this, and has been able to for many years:
> >>> int(u'٣')
> 3

but you can do this without support from int() as well:

>>> import unicodedata
>>> unicodedata.digit('٣')
3

and for Unihan numbers, you can do
>>> unicodedata.numeric('一')
1.0

and

>>> unicodedata.numeric('ⅷ')
8.0

and if you are so inclined,

>>> [unicodedata.numeric(c) for c in "ↂ ↁ ⅗ ⅞ 𐄳".split()]
[10000.0, 5000.0, 0.6, 0.875, 90000.0]

Do you want to see all these supported by float()?

[1] "makeunicodedata.py does not support Unihan digit data"
http://bugs.python.org/issue10575


More information about the Python-Dev mailing list