[Python-3000] string module trimming

Jim Jewett jimjjewett at gmail.com
Wed Apr 18 00:55:46 CEST 2007


On 4/17/07, Christian Heimes <lists at cheimes.de> wrote:
> Neal Norwitz schrieb:
> > I don't have any plans, just considering options.  Move them
> > somewhere?  Perhaps, trim the ones that are unused.  In a unicode
> > world, I'm not sure how much some of these make sense.  letters stands
> > out more than others.  I don't know enough about unicode to know if
> > digits or whitespace can be diff.

There are several additional characters in both sets, and plenty of
reasons that a given program might want to use a restricted set.
(Probably those already in string, or else a letters grouping set by
locale.)

> What do you think about replacing the definitions by information from
> the unicode character properties database. The information are available
> somewhere in Python:

> http://docs.python.org/lib/re-syntax.html

> \w ... With LOCALE, it will match the set [0-9_] plus whatever
> characters are defined as alphanumeric for the current locale. If
> UNICODE is set, this will match the characters [0-9_] plus whatever is
> classified as alphanumeric in the Unicode character properties database.

There are reasons to want exactly ASCII.

There are also reasons to want only "local" letters.  For example, in
a French interface, I might want to include the extra French letters,
but not the Greek.

Also note that regex isn't quite the only use of those letters groupings.

-jJ


More information about the Python-3000 mailing list