[Python-3000] locale-aware strings ?

Tue Sep 5 18:35:35 CEST 2006

On 9/5/06, Paul Prescod <paul at prescod.net> wrote:
> On 9/4/06, Guido van Rossum <guido at python.org> wrote:

> > In this particular case I don't care what's simpler to implement, but
> > what's most likely to do what the user expects.

Good.

> But now Europeans are just as likely to use UTF-8 as a national encoding

fine; then that will be the locale.

> and Asians each have MANY different encodings to select from (some defined by
> Unicode, some national).

and the one they typically use will be the locale.

If notepad (or vi/emacs/less/cat) agree on what a text file is, and
python doesn't, it is python that will lose.

>The direction over
> the lifetype of Python 3000 will be AWAY from national, local,
> locale-predictable encodings and TOWARDS global, standard encodings.

Ruby is not wedding itself to unicode precisely because they have seen
the opposite in Japan.  It sounded like the "unicode doesn't quite
work" problem will be permanent, because there are fundamental
differences over which glyphs should be unified when.  It isn't just a
matter of using a larger set; there are glyphs which should be unified
in some contexts but not others.

> Also, only a portion of the text data on a computer is in "documents" where
> the end-user has control over the encoding. There are also  many, many
> configuration files, emails, saved web pages, chat logs etc. where the
> encoding was selected by someone else with a potentially different
> nationality.

Typically, these either list the encoding explicitly, or stick to
something close to ASCII, which is included in most national
encodings.

> Beyond all of that: It just seems wrong to me that I could send someone a
> bunch of files and a Python program and their results processing them would
> be different from mine, despite the fact that we run the same version of
> Python on the same operating system.

So include the charset header.

-jJ