Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)

Peter Funk
Mon, 22 May 2000 15:02:18 +0200 (MEST)


> > So this simply works well as intended without having to add calls
> > to 'setlocale' to all application program using this C-library functions.

[Guido van Rossum]:
> I don;t believe that.  According to the ANSI standard, a C program
> *must* call setlocale(LC_..., "") if it wants the environment
> variables to be honored; without this call, the locale is always the
> "C" locale, which should *not* honor the environment variables.

pf@pefunbk> python 
Python 1.5.2 (#1, Jul 23 1999, 06:38:16)  [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> print string.upper("ä")

This was the vanilla Python 1.5.2 as originally delivered by SuSE Linux.  
But yes, you are right. :-(  My memory was confused by this practical
experience.  Now I like to quote from the man pages here:

man toupper:
       The details of what constitutes an uppercase or  lowercase
       letter  depend  on  the  current locale.  For example, the
       default "C" locale does not know about umlauts, so no con­
       version is done for them.

       In some non - English locales, there are lowercase letters
       with no corresponding  uppercase  equivalent;  the  German
       sharp s is one example.

man setlocale:
       A  program  may be made portable to all locales by calling
       setlocale(LC_ALL, "" ) after program   initialization,  by
       using  the  values  returned  from a localeconv() call for
       locale - dependent information and by using  strcoll()  or
       strxfrm() to compare strings.
       ANSI C, POSIX.1

       Linux  (that  is,  libc) supports the portable locales "C"
       and "POSIX".  In the good old days there used to  be  sup­
       port for the European Latin-1 "ISO-8859-1" locale (e.g. in
       libc-4.5.21 and  libc-4.6.27),  and  the  Russian  "KOI-8"
       (more  precisely,  "koi-8r") locale (e.g. in libc-4.6.27),
       so that having an environment variable LC_CTYPE=ISO-8859-1
       sufficed to make isprint() return the right answer.  These
       days non-English speaking Europeans have  to  work  a  bit
       harder, and must install actual locale files.

In recent Linux distributions almost every Linux C-program seems to 
contain this obligatory 'setlocale(LC_ALL, "");' line, so it's easy 
to forget about it.  However the core Python interpreter does not.
it seems the Linux C-Library is not fully ANSI compliant in this case.
It seems to honour the setting of $LANG regardless whether a program
calls 'setlocale' or not.

Regards, Peter