[Tutor] unicode: alpha, whitespaces and digits
Steven D'Aprano
steve at pearwood.info
Sun Dec 29 23:58:10 CET 2013
On Sun, Dec 29, 2013 at 02:36:32PM +0100, Ulrich Goebel wrote:
> Hallo,
>
> I have a unicode string s, for example u"abc", u"äöü", u"123" or
> something else, and I have to find out wether
>
> 1. s is not empty and contains only digits (as in u"123", but not in
> u"3.1415")
>
> or
>
> 2. s is empty or contains only whitespaces
>
> For all other cases I would assume a "normal" unicode string in s,
> whatever that may be.
>
> For the first case it could be s.isdigit(), s.isnumeric() or
> s.isdecimal() - but which one is the best?
Depends what you are trying to do. Only you can decide which is best.
The three methods do slightly different things:
- isdigit() tests for the digit characters 0...9, or their
equivalent in whatever native language your computer is
using.
- isdecimal() tests for decimal characters. That includes the
so-called "Arabic numerals" 0...9 (although the Arabs don't
use them!) as well as other decimal digits like ٠١٢...
(The above three are ARABIC-INDIC DIGIT ZERO through TWO.)
- isnumeric() tests for characters which have the Unicode
numeric value property. That includes decimal digits, as well
as non-digit numbers such as ½ and ¾.
If you want to test for something that a human reader will recognise as
a "whole number", s.isdigit() is probably the best one to use.
> For the second case it should be s.isspace(), but it works only on
> strings, not on unicode strings?
What gives you that impression? isspace works on Unicode strings too.
py> ' x'.isspace()
False
py> ' '.isspace()
True
For the second case, you also need to check for empty strings, so you
should use:
not s or s.isspace()
which will return True is s is empty or all whitespace, otherwise False.
--
Steven
More information about the Tutor
mailing list