Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

M.-A. Lemburg mal@lemburg.com
Mon, 22 May 2000 22:53:55 +0200


Fredrik Lundh wrote:
> 
> > > - 8-bit text strings using the system encoding.  upper, strip, etc works
> > >   as long as the locale is properly configured.
> > >
> > > - 8-bit unicode text strings.  upper, strip, etc may work, as long as the
> > >   system encoding is a subset of unicode -- which means US ASCII or
> > >   ISO Latin 1.
> >
> > This is a figment of your imagination.  You can use 8-bit text strings
> > to contain Latin-1, but you have to set your locale to match.
> 
> if that's a supported feature (instead of being deprecated in favour
> for unicode), maybe we should base the default unicode/string con-
> versions on the locale too?

This was proposed by Guido some time ago... the discussion
ended with the problem of extracting the encoding definition
from the locale names. There are some ways to solve this
problem (static mappings, fancy LANG variables etc.), but
AFAIK, there is no widely used standard on this yet, so
in the end you're stuck with defining the encoding by hand...
e.g.
	setenv LANG de_DE:latin-1

Perhaps we should help out a little and provide Python with
a parser for the LANG variable with some added magic
to provide useful defaults ?!

> [...]
> 
> this also solves the default conversion problem: use the locale environ-
> ment variables to determine the default encoding, and call
> sys.set_string_encoding from site.py (see my earlier post for details).

Right, that would indeed open up a path for consent...

> </F>
> 
> PS. shouldn't sys.set_string_encoding be sys.setstringencoding?

Perhaps... these were really only added as experimental feature
to test the various possibilities (and a possible implementation).

My original intention was removing these after final consent
-- perhaps we should keep the functionality (expanded
to a per thread setting; the global is a temporary hack) ?!
 
> >>> sys
> ... 'set_string_encoding', 'setcheckinterval', 'setprofile', 'settrace', ...
> 
> looks a little strange...

True; see above for the reason why ;-)

PS: What do you think about the current internal design of
sys.set_string_encoding() ? Note that hash() and the "st"
parser markers still use UTF-8.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/