Some information about locale (was Re: [Python-Dev] repr vs. str and locales again)
Peter Funk
pf@artcom-gmbh.de
Mon, 22 May 2000 15:02:18 +0200 (MEST)
Hi!
[...]
[me]:
> > So this simply works well as intended without having to add calls
> > to 'setlocale' to all application program using this C-library functions.
[Guido van Rossum]:
> I don;t believe that. According to the ANSI standard, a C program
> *must* call setlocale(LC_..., "") if it wants the environment
> variables to be honored; without this call, the locale is always the
> "C" locale, which should *not* honor the environment variables.
pf@pefunbk> python
Python 1.5.2 (#1, Jul 23 1999, 06:38:16) [GCC egcs-2.91.66 19990314/Linux (egcs- on linux2
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> import string
>>> print string.upper("ä")
Ä
>>>
This was the vanilla Python 1.5.2 as originally delivered by SuSE Linux.
But yes, you are right. :-( My memory was confused by this practical
experience. Now I like to quote from the man pages here:
man toupper:
[...]
BUGS
The details of what constitutes an uppercase or lowercase
letter depend on the current locale. For example, the
default "C" locale does not know about umlauts, so no con
version is done for them.
In some non - English locales, there are lowercase letters
with no corresponding uppercase equivalent; the German
sharp s is one example.
man setlocale:
[...]
A program may be made portable to all locales by calling
setlocale(LC_ALL, "" ) after program initialization, by
using the values returned from a localeconv() call for
locale - dependent information and by using strcoll() or
strxfrm() to compare strings.
[...]
CONFORMING TO
ANSI C, POSIX.1
Linux (that is, libc) supports the portable locales "C"
and "POSIX". In the good old days there used to be sup
port for the European Latin-1 "ISO-8859-1" locale (e.g. in
libc-4.5.21 and libc-4.6.27), and the Russian "KOI-8"
(more precisely, "koi-8r") locale (e.g. in libc-4.6.27),
so that having an environment variable LC_CTYPE=ISO-8859-1
sufficed to make isprint() return the right answer. These
days non-English speaking Europeans have to work a bit
harder, and must install actual locale files.
[...]
In recent Linux distributions almost every Linux C-program seems to
contain this obligatory 'setlocale(LC_ALL, "");' line, so it's easy
to forget about it. However the core Python interpreter does not.
it seems the Linux C-Library is not fully ANSI compliant in this case.
It seems to honour the setting of $LANG regardless whether a program
calls 'setlocale' or not.
Regards, Peter