[issue10575] makeunicodedata.py does not support Unihan digit data

Marc-Andre Lemburg report at bugs.python.org
Mon Nov 29 16:15:40 CET 2010

Marc-Andre Lemburg <mal at egenix.com> added the comment:

The code point is also not listed as decimal digit (relevant for the int() decimal parsing):

>>> unicodedata.decimal(unicode('三', 'utf-8'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: not a decimal

This is the relevant part of the script:

        for line in open(unihan):
            if not line.startswith('U+'):
            code, tag, value = line.split(None, 3)[:3]
            if tag not in ('kAccountingNumeric', 'kPrimaryNumeric',
            value = value.strip().replace(',', '')
            i = int(code[2:], 16)
            # Patch the numeric field
            if table[i] is not None:
                table[i][8] = value

The decimal column is not set for code points that have a kPrimaryNumeric value set. Position table[i][8] refers to the
numeric database entry, which correctly gives:

>>> unicodedata.numeric(unicode('三', 'utf-8'))


