[Python-3000] string module trimming

Jim Jewett jimjjewett at gmail.com
Thu Apr 19 01:08:59 CEST 2007


On 4/18/07, Guido van Rossum <guido at python.org> wrote:
> On 4/18/07, Jim Jewett <jimjjewett at gmail.com> wrote:

> > Today, string.letters works most easily with ASCII supersets, and is
> > effectively limited to 8-bit encodings.  Once everything is unicode, I
> > don't think that 8-bit restriction should apply any more.

> But we already went over this. There are over 40K letters in Unicode.
> It simply makes no sense to have a string.letters approaching that
> size.

Agreed.  But there aren't 40K (alphabetic) letters in any particular
locale.  Most individual languages will have less than 100.

As a proxy for measuring "local" characters, I'll note that during
some optimization drives for Pango (e.g.,
http://primates.ximian.com/~federico/news-2005-11.html#04 ) it turned
out that there were only two non C-J-K languages that needed more than
256 cache positions in their character glyph tables.

> > Unless I missed it (and I may have), unicode itself sort of ducks the
> > question about how to sort strings.  Python really needs to provide
> > *an* answer, but I'm not sure it is possible to provide the (single)
> > correct answer.

> The Unicode standard certainly has a solution, but it is complicated
> and I don't believe it is currently implemented in core Python.

I guess you're right; I saw too many alternatives the last time I
looked, and must have stopped reading http://unicode.org/reports/tr10/
after section 1, where it becomes obvious that there is no
context-free right answer.

> > string.letters is one workaround, and I don't think we should remove
> > it until a better solution (or workaround) is available.

> I disagree. The correct solution is to implement the Unicode support
> for locale-specific sorting.

And set-inclusion.

I'm not convinced that waiting for such a heavyweight solution is
really the best choice, particularly since the spec itself warns
against using the strictest forms (too inefficient).

> Remember that the locale module supports only a single, global locale
> at a time. This renders it totally useless in many apps requiring
> locale support (such as web servers).

Fair enough.

-jJ


More information about the Python-3000 mailing list