[Tutor] finding digit in string
eryksun
eryksun at gmail.com
Tue Oct 9 02:01:26 CEST 2012
On Mon, Oct 8, 2012 at 4:11 PM, Prasad, Ramit <ramit.prasad at jpmorgan.com> wrote:
>
>> for ch in text:
>> if '0' <= ch <= '9':
>> doSomething(ch)
>
> I am not sure that will work very well with Unicode numbers. I would
> assume (you know what they say about assuming) that str.isdigit()
> works better with international characters/numbers.
In my tests below, isdigit() matches both decimal digits ('Nd') and
other digits ('No'). None of the 'No' category digits works with
int().
Python 2.7.3
>>> chars = [unichr(i) for i in xrange(sys.maxunicode + 1)]
>>> digits = [c for c in chars if c.isdigit()]
>>> digits_d = [d for d in digits if category(d) == 'Nd']
>>> digits_o = [d for d in digits if category(d) == 'No']
>>> len(digits), len(digits_d), len(digits_o)
(529, 411, 118)
Decimal
>>> nums = [int(d) for d in digits_d]
>>> [nums.count(i) for i in range(10)]
[41, 42, 41, 41, 41, 41, 41, 41, 41, 41]
Other
>>> print u''.join(digits_o[:3] + digits_o[12:56])
²³¹⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐
>>> print u''.join(digits_o[67:94])
❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒
>>> print u''.join(digits_o[3:12])
፩፪፫፬፭፮፯፰፱
>>> int(digits_o[67])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'decimal' codec can't encode character
u'\u2776' in position 0: invalid decimal Unicode string
Python 3.2.3
>>> chars = [chr(i) for i in range(sys.maxunicode + 1)]
>>> digits = [c for c in chars if c.isdigit()]
>>> digits_d = [d for d in digits if category(d) == 'Nd']
>>> digits_o = [d for d in digits if category(d) == 'No']
>>> len(digits), len(digits_d), len(digits_o)
(548, 420, 128)
Decimal
>>> nums = [int(d) for d in digits_d]
>>> [nums.count(i) for i in range(10)]
[42, 42, 42, 42, 42, 42, 42, 42, 42, 42]
Other
>>> print(*(digits_o[:3] + digits_o[13:57]), sep='')
²³¹⁰⁴⁵⁶⁷⁸⁹₀₁₂₃₄₅₆₇₈₉①②③④⑤⑥⑦⑧⑨⑴⑵⑶⑷⑸⑹⑺⑻⑼⒈⒉⒊⒋⒌⒍⒎⒏⒐
>>> print(*digits_o[68:95], sep='')
❶❷❸❹❺❻❼❽❾➀➁➂➃➄➅➆➇➈➊➋➌➍➎➏➐➑➒
>>> print(*digits_o[3:12], sep='')
፩፪፫፬፭፮፯፰፱
>>> int(digits_o[68])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '❶'
More information about the Tutor
mailing list