[Python-3000] string module trimming

Tue Apr 17 23:28:00 CEST 2007

On 4/17/07, Christian Heimes <lists at cheimes.de> wrote:
> Neal Norwitz schrieb:
> > I don't have any plans, just considering options.  Move them
> > somewhere?  Perhaps, trim the ones that are unused.  In a unicode
> > world, I'm not sure how much some of these make sense.  letters stands
> > out more than others.  I don't know enough about unicode to know if
> > digits or whitespace can be diff.
>
> What do you think about replacing the definitions by information from
> the unicode character properties database. The information are available
> somewhere in Python:
>
> http://docs.python.org/lib/re-syntax.html
>
> \w ... With LOCALE, it will match the set [0-9_] plus whatever
> characters are defined as alphanumeric for the current locale. If
> UNICODE is set, this will match the characters [0-9_] plus whatever is
> classified as alphanumeric in the Unicode character properties database.

Yes, unicode.islower() and friends have this information.

It would be silly to set e.g. letters to a string of all unicode
letters -- that would be a string of 46618 characters! Similar, there
are 304 unicode digits. (And this is in a narrow Unicode build, only
supporting the basic Unicode plane, 0--2**16!)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)