Re: [Python-Dev] unicode alphanumerics

3 Jul 2000

      [M.-A. Lemburg]
...
"M.-A. Lemburg" wrote:
...
Fredrik Lundh wrote:
...
how about this plan:
-- you add a Py_UNICODE_ALPHA to unicodeobject.h asap,
   which does exactly that (or I can do that, if you prefer).
   (and maybe even a Py_UNICODE_ALNUM)
Ok, I'll add Py_UNICODE_ISALPHA and Py_UNICODE_ISALNUM
(first with approximations of the sort you give above and
later with true implementations using tables in unicodectype.c)
on Monday... gotta run now.
...
-- I change SRE to use that asap.
-- you, I, or someone else add a better implementation,
   some other day.
I've just looked into this... the problem here is what to
consider as being "alpha" and what "numeric".
I could add two new tables for the characters with category 'Lo'
(other letters, not cased) and 'Lm' (letter modifiers)
to match all letters in the Unicode database, but those
tables have some 5200 entries (note that there are only 804 lower
case letters and 686 upper case ones).
In JDK1.3, Character.isLetter(..) and Character.isDigit(..) are 
documented as:

  http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isLetter(char)
  http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isDigit(char)
  http://java.sun.com/j2se/1.3/docs/api/java/lang/Character.html#isLetterOrDig...)

I guess that java uses the extra huge tables.

regards,
finn

Re: [Python-Dev] unicode alphanumerics

bckfnn＠worldonline.dk