[Python-Dev] Unicode character property methods
Guido van Rossum
guido@python.org
Mon, 06 Mar 2000 18:12:33 -0500
[MAL]
> > > As you may have noticed, the Unicode objects provide
> > > new methods .islower(), .isupper() and .istitle(). Finn Bock
> > > mentioned that Java also provides .isdigit() and .isspace().
> > >
> > > Question: should Unicode also provide these character
> > > property methods: .isdigit(), .isnumeric(), .isdecimal()
> > > and .isspace() ? Plus maybe .digit(), .numeric() and
> > > .decimal() for the corresponding decoding ?
[Guido]
> > What would be the difference between isdigit, isnumeric, isdecimal?
> > I'd say don't do more than Java. I don't understand what the
> > "corresponding decoding" refers to. What would "3".decimal() return?
[MAL]
> These originate in the Unicode database; see
>
> ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html
>
> Here are the descriptions:
>
> """
> 6
> Decimal digit value
> normative
> This is a numeric field. If the
> character has the decimal digit
> property, as specified in Chapter
> 4 of the Unicode Standard, the
> value of that digit is represented
> with an integer value in this field
> 7
> Digit value
> normative
> This is a numeric field. If the
> character represents a digit, not
> necessarily a decimal digit, the
> value is here. This covers digits
> which do not form decimal radix
> forms, such as the compatibility
> superscript digits
> 8
> Numeric value
> normative
> This is a numeric field. If the
> character has the numeric
> property, as specified in Chapter
> 4 of the Unicode Standard, the
> value of that character is
> represented with an integer or
> rational number in this field. This
> includes fractions as, e.g., "1/5" for
> U+2155 VULGAR FRACTION
> ONE FIFTH Also included are
> numerical values for compatibility
> characters such as circled
> numbers.
>
> u"3".decimal() would return 3. u"\u2155".
>
> Some more examples from the unicodedata module (which makes
> all fields of the database available in Python):
>
> >>> unicodedata.decimal(u"3")
> 3
> >>> unicodedata.decimal(u"²")
> 2
> >>> unicodedata.digit(u"²")
> 2
> >>> unicodedata.numeric(u"²")
> 2.0
> >>> unicodedata.numeric(u"\u2155")
> 0.2
> >>> unicodedata.numeric(u'\u215b')
> 0.125
Hm, very Unicode centric. Probably best left out of the general
string methods. Isspace() seems useful, and an isdigit() that is only
true for ASCII '0' - '9' also makes sense.
What about "123".isdigit()? What does Java say? Or do these only
apply to single chars there? I think "123".isdigit() should be true
if "abc".islower() is true.
> > > Similar APIs are already available through the unicodedata
> > > module, but could easily be moved to the Unicode object
> > > (they cause the builtin interpreter to grow a bit in size
> > > due to the new mapping tables).
> > >
> > > BTW, string.atoi et al. are currently not mapped to
> > > string methods... should they be ?
> >
> > They are mapped to int() c.s.
>
> Hmm, I just noticed that int() et friends don't like
> Unicode... shouldn't they use the "t" parser marker
> instead of requiring a string or tp_int compatible
> type ?
Good catch. Go ahead.
--Guido van Rossum (home page: http://www.python.org/~guido/)