[docs] [issue36417] unicode.isdecimal bug in online Python 2 documentation

10 Apr 2019

      zheng <zheng0.tao@gmail.com> added the comment:

I propose we copy over the exact changes made to the Python 3 documentation.

I looked through the code mentioned in the other thread. Namely, `Objects/unicodeobject.c` and `Tools/unicode/makeunicodedata.py`. The implementation is identical between python 2 and python 3. The only difference appears to be the unicode version used.

    # decimal digit, integer digit
                decimal = 0
                if record[6]:
                    flags |= DECIMAL_MASK
                    decimal = int(record[6])
                digit = 0
                if record[7]:
                    flags |= DIGIT_MASK
                    digit = int(record[7])
                if record[8]:
                    flags |= NUMERIC_MASK
                    numeric.setdefault(record[8], []).append(char)

Another form of validation I did was enumerate all the digits and decimals and compare between versions. It looks like the general change is that there are a bunch of new unicode characters introduced in python 3. The exception is NEW TAI LUE THAM DIGIT ONE which gets recategorized as a digit.

python 2, compiled with UCS4
for u in map(unichr, list(range(0x10FFFF))):
    if unicode.isdigit(u):
        print(unicodedata.name(u))

python 3
for u in map(chr, range(0x10FFFF)):
    if str.isdigit(u):
        print(name(u))

----------
nosy: +zheng

_______________________________________
Python tracker <report@bugs.python.org>
<https://bugs.python.org/issue36417>
_______________________________________