[docs] [issue26483] docs unclear on difference between str.isdigit() and str.isdecimal()

Julien Palard report at bugs.python.org
Thu Dec 1 04:28:49 EST 2016


Julien Palard added the comment:

“digits which do not form decimal radix forms”

> “forming a form” seems a long way of saying very little. The difference seems a bit vague

> I gather that digits not in the Unicode “decimal digit” category are often (always?) still decimal digits

I expected them not to, but they often are representative of a base 10 value:

>>> import sys
>>> import unicodedata
>>> chars = ''.join(map(chr, range(sys.maxunicode+1)))
>>> decimals = ''.join(filter(str.isdecimal, chars))
>>> digits = ''.join(filter(str.isdigit, chars))
>>> non_decimal_digits = set(digits) - set(decimals)
>>> from collections import Counter
>>> Counter([unicodedata.digit(char) for char in non_decimal_digits])
Counter({1: 15, 2: 14, 3: 14, 4: 14, 5: 13, 6: 13, 7: 13, 8: 13, 9: 13, 0: 6})

But, note that there's one more in the range [1,4], it's the [Kharosthi](https://en.wikipedia.org/wiki/Kharosthi) numbers, they do not use base 10 but a notation reminiscent of Roman numerals.

So here, clearly, all digits are not an notation for a base 10 value.
 
> but primarily used for a symbolic or typographical meaning more than in a plain number, e.g. superscripts, subscripts and other fonts, added circles and other decorations.

Which also can't be used to form a base 10 number.

So here is another proposition for isdecimal, probably more human friendly:

    Return true if all characters in the string are decimal
    characters and there is at least one character, false
    otherwise. Decimal characters are those that can be used to form
    numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT
    ZERO. Formally a decimal character is a character in the Unicode
    General Category "Nd".

And here is another proposition for isdigit, probably friendlier too:

    Return true if all characters in the string are digits and there is at least one
    character, false otherwise.  Digits include decimal characters and digits that need
    special handling, such as the compatibility superscript digits.
    This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers.
    Formally, a digit is a character that has the property value
    Numeric_Type=Digit or Numeric_Type=Decimal.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26483>
_______________________________________


More information about the docs mailing list