[Python-3000] string module trimming
Guido van Rossum
guido at python.org
Tue Apr 17 23:28:00 CEST 2007
On 4/17/07, Christian Heimes <lists at cheimes.de> wrote:
> Neal Norwitz schrieb:
> > I don't have any plans, just considering options. Move them
> > somewhere? Perhaps, trim the ones that are unused. In a unicode
> > world, I'm not sure how much some of these make sense. letters stands
> > out more than others. I don't know enough about unicode to know if
> > digits or whitespace can be diff.
>
> What do you think about replacing the definitions by information from
> the unicode character properties database. The information are available
> somewhere in Python:
>
> http://docs.python.org/lib/re-syntax.html
>
> \w ... With LOCALE, it will match the set [0-9_] plus whatever
> characters are defined as alphanumeric for the current locale. If
> UNICODE is set, this will match the characters [0-9_] plus whatever is
> classified as alphanumeric in the Unicode character properties database.
Yes, unicode.islower() and friends have this information.
It would be silly to set e.g. letters to a string of all unicode
letters -- that would be a string of 46618 characters! Similar, there
are 304 unicode digits. (And this is in a narrow Unicode build, only
supporting the basic Unicode plane, 0--2**16!)
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list