[issue10575] makeunicodedata.py does not support Unihan digit data

Marc-Andre Lemburg report at bugs.python.org
Mon Nov 29 12:10:56 CET 2010


New submission from Marc-Andre Lemburg <mal at egenix.com>:

The script only patches numeric data into the table (field 8), but does not update the digit field (field 7).

As a result, ideographs used for Chinese digits are not recognized as digits and not evaluated by int(), long() and float():

    http://en.wikipedia.org/wiki/Numbers_in_Chinese_culture

>>> unicode('三', 'utf-8')
u'\u4e09'

>>> int(unicode('三', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character u'\u4e09' in position 0: invalid decimal Unicode string
> <stdin>(1)<module>()

>>> import unicodedata
>>> unicodedata.digit(unicode('三', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a digit

The code point refers to the digit 3.

----------
components: Unicode
messages: 122786
nosy: lemburg
priority: normal
severity: normal
status: open
title: makeunicodedata.py does not support Unihan digit data
versions: Python 2.7, Python 3.2, Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10575>
_______________________________________


More information about the Python-bugs-list mailing list