[Tutor] three numbers for one

Oscar Benjamin oscar.j.benjamin at gmail.com
Sun Jun 9 16:26:32 CEST 2013


On 8 June 2013 06:49, eryksun <eryksun at gmail.com> wrote:
> On Fri, Jun 7, 2013 at 11:11 PM, Jim Mooney <cybervigilante at gmail.com> wrote:
>> I'm puzzling out the difference between isdigit, isdecimal, and
>> isnumeric. But at this point, for simple  practice programs, which is
>> the best to use for plain old 0123456589 , without special characters?
>
> The isnumeric, isdigit, and isdecimal predicates use Unicode character
> properties that are defined in UnicodeData.txt:
>
> http://www.unicode.org/Public/6.1.0/ucd
>
> The most restrictive of the 3 is isdecimal. If a string isdecimal(),
> you can convert it with int() -- even if you're mixing scripts:
>
>     >>> unicodedata.name('\u06f0')
>     'EXTENDED ARABIC-INDIC DIGIT ZERO'
>     >>> unicodedata.decimal('\u06f0')
>     0
>     >>> '1234\u06f0'.isdecimal()
>     True
>     >>> int('1234\u06f0')
>     12340

I didn't know about this. In the time since this thread started a
parallel thread has emerged on python-ideas and it seems that Guido
was unaware of these changes in Python 3:

http://mail.python.org/pipermail/python-ideas/2013-June/021216.html

I don't think I like this behaviour: I don't mind the isdigit,
isdecimal and isnumeric methods but I don't want int() to accept
non-ascii characters. This is a reasonable addition to the unicodedata
module but should not happen when simply calling int().

To answer Jim's original question, there doesn't seem to be a function
to check for only plain old 0-9 but you can make your own easily
enough:

>>> def is_ascii_digit(string):
...     return not (set(string) - set('0123456789'))
...
>>> is_ascii_digit('qwe')
False
>>> is_ascii_digit('0123')
True
>>> is_ascii_digit('0123f')
False

An alternative method depending on where your strings are actually
coming from would be to use byte-strings or the ascii codec. I may
consider doing this in future; in my own applications if I pass a
non-ascii digit to int() then I definitely have data corruption. Then
again it's unlikely that the corruption would manifest itself in
precisely this way since only a small proportion of non-ascii unicode
characters would be accepted by int().


Oscar


More information about the Tutor mailing list