Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

Ka-Ping Yee ping@lfw.org
Mon, 22 May 2000 09:17:01 -0700 (PDT)


On Mon, 22 May 2000, Guido van Rossum wrote:
> > note that this leaves us with four string flavours in 1.6:
> > 
> > - 8-bit binary arrays.  may contain binary goop, or text in some strange
> >   encoding.  upper, strip, etc should not be used.
> 
> These are not strings.

Indeed -- but at the moment, we're letting people continue to
use strings this way, since they already do it.

> > - 8-bit text strings using the system encoding.  upper, strip, etc works
> >   as long as the locale is properly configured.
> > 
> > - 8-bit unicode text strings.  upper, strip, etc may work, as long as the
> >   system encoding is a subset of unicode -- which means US ASCII or
> >   ISO Latin 1.
> 
> This is a figment of your imagination.  You can use 8-bit text strings
> to contain Latin-1, but you have to set your locale to match.

I would like it to be only the latter, as Fred, i, and others
have previously suggested, and as corresponds to your ASCII
proposal for treatment of 8-bit strings.

But doesn't the current locale-dependent behaviour of upper()
etc. mean that strings are getting interpreted in the first way?

> > is this complexity really worth it?
> 
> From a backwards compatibility point of view, yes.  Basically,
> programs that don't use Unicode should see no change in semantics.

I'm afraid i have to agree with this, because i don't see any
other option that lets us escape from any of these four ways
of using strings...


-- ?!ng