[Python-3000] string module trimming

Jason Orendorff jason.orendorff at gmail.com
Thu Apr 19 20:51:24 CEST 2007

On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 4/18/07, Guido van Rossum <guido at python.org> wrote:
> > On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> > > Today, string.letters works most easily with ASCII supersets, and is
> > > effectively limited to 8-bit encodings.  Once everything is unicode, I
> > > don't think that 8-bit restriction should apply any more.
> > But we already went over this. There are over 40K letters in Unicode.
> > It simply makes no sense to have a string.letters approaching that
> > size.
> Agreed.  But there aren't 40K (alphabetic) letters in any particular
> locale.  Most individual languages will have less than 100.

But isn't the hip thing these days to use UTF-8 as the character set,
as in "LC_ALL=en_US.UTF-8"?  I picked a random Linux machine
and this happened to be the default LANG.  A quick C test revealed
that iswalpha() thinks there are 45,974 letters in that locale.

Anyway, let's discuss your use cases.

One: UIs that involve displaying all the letters.

> There are also reasons to want only "local" letters.  For example, in
> a French interface, I might want to include the extra French letters,
> but not the Greek.

But POSIX locales don't claim to provide this information,
as far as I can tell.

It seems like for a quick throwaway program, you're better off
ignoring languages.  And for even a slightly serious program,
string.letters is wrong in a dozen ways.

Seriously, a table of alphabets that's saner than string.letters
is pretty trivial to write:

  alphabets = {
      'es': ("A B C Ch D E F G H I J K L Ll M "
          + "N \u00d1 O P Q R S T U V W X Y Z").split(),

...and you can go from there.

Two: Collation.

Collation can be done right: provide a function text.sort_key()
that converts a str into an opaque thing that has the desired
ordering.  You would use it like so:

  records.sort(key=lambda rec: text.sort_key(rec.title))

Jython can implement this using java.text.Collator.  IronPython can
use CultureInfo.CompareInfo.GetSortKey().  CPython can call
LCMapString() on Windows and wcsxfrm() on POSIX, falling back on just
returning the string itself.


More information about the Python-3000 mailing list