[docs] [issue25275] Documentation v/s behaviour mismatch wrt integer literals containing non-ASCII characters

Shreevatsa R report at bugs.python.org
Wed Sep 30 22:45:10 CEST 2015


Shreevatsa R added the comment:

Minor difference, but the relevant function for int() is not quite isdigit(), e.g.:

    >>> import unicodedata
    >>> s = u'\u2460'
    >>> unicodedata.name(s)
    'CIRCLED DIGIT ONE'
    >>> print s
    ①
    >>> s.isdigit()
    True
    >>> s.isdecimal()
    False
    >>> int(s)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'decimal' codec can't encode character u'\u2460' in position 0: invalid decimal Unicode string

It seems to be isdecimal(), plus if there are other digits in the string then many leading and trailing space-like characters are also allowed (e.g. 5760 OGHAM SPACE MARK or 8195 EM SPACE or 12288 IDEOGRAPHIC SPACE:

    >>> 987 == int(u'\u3000\n 987\u1680\t')
    True

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25275>
_______________________________________


More information about the docs mailing list