[issue10575] makeunicodedata.py does not support Unihan digit data

Marc-Andre Lemburg report at bugs.python.org
Mon Nov 29 21:42:32 CET 2010


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Martin v. Löwis wrote:
> 
> Martin v. Löwis <martin at v.loewis.de> added the comment:
> 
> This is not a bug, see
> 
> http://www.unicode.org/reports/tr44/#Numeric_Value
> 
> Characters have a Numeric_Type property of either null, Decimal, Digit, or Numeric. For non-Unihan characters, this is denoted by filling out either no column, or (6,7,and 8), or (7 and 8), or (8), respectively, as implemented by makeunicodedata.py. Unihan characters have only null or Numeric as their Numeric_Type property, never Decimal nor Digit, see
> 
>  http://www.unicode.org/reports/tr44/#Numeric_Type_Han
> 
> Therefore, it is correct that digit() raises a ValueError for U+4e09.

You're right. I guess this is a bug in the UCD or TR44/TR38 itself.

It looks like the numeric properties are not separated in the
Unihan database in the same way they are for the standard UCD.

Unihan separates based on usage context, whereas UCS takes
a parsing approach.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10575>
_______________________________________


More information about the Python-bugs-list mailing list