Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

Peter Funk pf@artcom-gmbh.de
Mon, 22 May 2000 15:02:18 +0200 (MEST)


Hi!

[...]
[me]:
> > So this simply works well as intended without having to add calls
> > to 'setlocale' to all application program using this C-library functions.

[Guido van Rossum]:
> I don;t believe that.  According to the ANSI standard, a C program
> *must* call setlocale(LC_..., "") if it wants the environment
> variables to be honored; without this call, the locale is always the
> "C" locale, which should *not* honor the environment variables.

pf@pefunbk> python 
Python 1.5.2 (#1, Jul 23 1999, 06:38:16)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> print string.upper("ä")
Ä
>>> 

This was the vanilla Python 1.5.2 as originally delivered by SuSE Linux.  
But yes, you are right. :-(  My memory was confused by this practical
experience.  Now I like to quote from the man pages here:

man toupper:
[...]
BUGS
       The details of what constitutes an uppercase or  lowercase
       letter  depend  on  the  current locale.  For example, the
       default "C" locale does not know about umlauts, so no con­
       version is done for them.

       In some non - English locales, there are lowercase letters
       with no corresponding  uppercase  equivalent;  the  German
       sharp s is one example.

man setlocale:
[...]
       A  program  may be made portable to all locales by calling
       setlocale(LC_ALL, "" ) after program   initialization,  by
       using  the  values  returned  from a localeconv() call for
       locale - dependent information and by using  strcoll()  or
       strxfrm() to compare strings.
[...]
   CONFORMING TO
       ANSI C, POSIX.1

       Linux  (that  is,  libc) supports the portable locales "C"
       and "POSIX".  In the good old days there used to  be  sup­
       port for the European Latin-1 "ISO-8859-1" locale (e.g. in
       libc-4.5.21 and  libc-4.6.27),  and  the  Russian  "KOI-8"
       (more  precisely,  "koi-8r") locale (e.g. in libc-4.6.27),
       so that having an environment variable LC_CTYPE=ISO-8859-1
       sufficed to make isprint() return the right answer.  These
       days non-English speaking Europeans have  to  work  a  bit
       harder, and must install actual locale files.
[...]

In recent Linux distributions almost every Linux C-program seems to 
contain this obligatory 'setlocale(LC_ALL, "");' line, so it's easy 
to forget about it.  However the core Python interpreter does not.
it seems the Linux C-Library is not fully ANSI compliant in this case.
It seems to honour the setting of $LANG regardless whether a program
calls 'setlocale' or not.

Regards, Peter