Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

Guido van Rossum guido@python.org
Mon, 22 May 2000 15:38:17 -0500


[Fredrik]
> > > - 8-bit binary arrays.  may contain binary goop, or text in some strange
> > >   encoding.  upper, strip, etc should not be used.

[Guido]
> > These are not strings.

[Ping]
> Indeed -- but at the moment, we're letting people continue to
> use strings this way, since they already do it.

Oops, mistake.  I thought that Fredrik (not Fred! that's another
person in this context!) meant the array module, but upon re-reading
he didn't.

> > > - 8-bit text strings using the system encoding.  upper, strip, etc works
> > >   as long as the locale is properly configured.
> > > 
> > > - 8-bit unicode text strings.  upper, strip, etc may work, as long as the
> > >   system encoding is a subset of unicode -- which means US ASCII or
> > >   ISO Latin 1.
> > 
> > This is a figment of your imagination.  You can use 8-bit text strings
> > to contain Latin-1, but you have to set your locale to match.
> 
> I would like it to be only the latter, as Fred, i, and others
Fredrik, right?
> have previously suggested, and as corresponds to your ASCII
> proposal for treatment of 8-bit strings.
> 
> But doesn't the current locale-dependent behaviour of upper()
> etc. mean that strings are getting interpreted in the first way?

That's what I meant to say -- 8-bit strings use the system encoding
guided by the locale.

> > > is this complexity really worth it?
> > 
> > From a backwards compatibility point of view, yes.  Basically,
> > programs that don't use Unicode should see no change in semantics.
> 
> I'm afraid i have to agree with this, because i don't see any
> other option that lets us escape from any of these four ways
> of using strings...

Which is why I find Fredrik's attitude unproductive.

And where's the SRE release?

--Guido van Rossum (home page: http://www.python.org/~guido/)