Some information about locale (was Re: [Python-Dev] repr vs.
str and locales again)
Ka-Ping Yee
ping@lfw.org
Mon, 22 May 2000 09:17:01 -0700 (PDT)
On Mon, 22 May 2000, Guido van Rossum wrote:
> > note that this leaves us with four string flavours in 1.6:
> >
> > - 8-bit binary arrays. may contain binary goop, or text in some strange
> > encoding. upper, strip, etc should not be used.
>
> These are not strings.
Indeed -- but at the moment, we're letting people continue to
use strings this way, since they already do it.
> > - 8-bit text strings using the system encoding. upper, strip, etc works
> > as long as the locale is properly configured.
> >
> > - 8-bit unicode text strings. upper, strip, etc may work, as long as the
> > system encoding is a subset of unicode -- which means US ASCII or
> > ISO Latin 1.
>
> This is a figment of your imagination. You can use 8-bit text strings
> to contain Latin-1, but you have to set your locale to match.
I would like it to be only the latter, as Fred, i, and others
have previously suggested, and as corresponds to your ASCII
proposal for treatment of 8-bit strings.
But doesn't the current locale-dependent behaviour of upper()
etc. mean that strings are getting interpreted in the first way?
> > is this complexity really worth it?
>
> From a backwards compatibility point of view, yes. Basically,
> programs that don't use Unicode should see no change in semantics.
I'm afraid i have to agree with this, because i don't see any
other option that lets us escape from any of these four ways
of using strings...
-- ?!ng