[Python-3000] string module trimming
Jason Orendorff
jason.orendorff at gmail.com
Thu Apr 19 20:51:24 CEST 2007
On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 4/18/07, Guido van Rossum <guido at python.org> wrote:
> > On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:
>
> > > Today, string.letters works most easily with ASCII supersets, and is
> > > effectively limited to 8-bit encodings. Once everything is unicode, I
> > > don't think that 8-bit restriction should apply any more.
>
> > But we already went over this. There are over 40K letters in Unicode.
> > It simply makes no sense to have a string.letters approaching that
> > size.
>
> Agreed. But there aren't 40K (alphabetic) letters in any particular
> locale. Most individual languages will have less than 100.
But isn't the hip thing these days to use UTF-8 as the character set,
as in "LC_ALL=en_US.UTF-8"? I picked a random Linux machine
and this happened to be the default LANG. A quick C test revealed
that iswalpha() thinks there are 45,974 letters in that locale.
Anyway, let's discuss your use cases.
One: UIs that involve displaying all the letters.
> There are also reasons to want only "local" letters. For example, in
> a French interface, I might want to include the extra French letters,
> but not the Greek.
But POSIX locales don't claim to provide this information,
as far as I can tell.
It seems like for a quick throwaway program, you're better off
ignoring languages. And for even a slightly serious program,
string.letters is wrong in a dozen ways.
Seriously, a table of alphabets that's saner than string.letters
is pretty trivial to write:
alphabets = {
'en': list("ABCDEFGHIJKLMNOPQRSTUVWXYZ"),
'es': ("A B C Ch D E F G H I J K L Ll M "
+ "N \u00d1 O P Q R S T U V W X Y Z").split(),
...
}
...and you can go from there.
Two: Collation.
Collation can be done right: provide a function text.sort_key()
that converts a str into an opaque thing that has the desired
ordering. You would use it like so:
names.sort(key=text.sort_key)
records.sort(key=lambda rec: text.sort_key(rec.title))
Jython can implement this using java.text.Collator. IronPython can
use CultureInfo.CompareInfo.GetSortKey(). CPython can call
LCMapString() on Windows and wcsxfrm() on POSIX, falling back on just
returning the string itself.
-j
More information about the Python-3000
mailing list