Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Mon, 22 May 2000 09:20:50 +0200


Peter Funk wrote:
> AFAIK locale and friends conform to POSIX.1.  Calling this obsolescent...
> hmmm... may offend a *LOT* of people.  Try this on comp.os.linux.advocacy ;-)

you're missing the point -- now that we've added unicode support to
Python, the old 8-bit locale *ctype* stuff no longer works.  while some
platforms implement a wctype interface, it's not widely available, and it's
not always unicode.

so in order to provide platform-independent unicode support, Python 1.6
comes with unicode-aware and fully portable replacements for the ctype
functions.

the code is already in there...

> On POSIX systems there are a several environment variables used to
> control the default locale settings for a users session.  For example
> on my SuSE Linux system currently running in the german locale the
> environment variable LC_CTYPE=de_DE is automatically set by a file
> /etc/profile during login, which causes automatically the C-library
> function toupper('ä') to return an 'Ä' ---you should see
> a lower case a-umlaut as argument and an upper case umlaut as return
> value--- without having all applications to call 'setlocale' explicitly.
>
> So this simply works well as intended without having to add calls
> to 'setlocale' to all application program using this C-library functions.

note that this leaves us with four string flavours in 1.6:

- 8-bit binary arrays.  may contain binary goop, or text in some strange
  encoding.  upper, strip, etc should not be used.

- 8-bit text strings using the system encoding.  upper, strip, etc works
  as long as the locale is properly configured.

- 8-bit unicode text strings.  upper, strip, etc may work, as long as the
  system encoding is a subset of unicode -- which means US ASCII or
  ISO Latin 1.

- wide unicode text strings.  upper, strip, etc always works.

is this complexity really worth it?

</F>