[I18n-sig] Determine encoding from $LANG

Markus Kuhn mkuhn@suse.de
Thu, 28 Jun 2001 10:03:59 +0200 (CEST)

On Tue, 26 Jun 2001, Bruno Haible wrote:
>      A program cannot be considered properly internationalized
>      until it obeys the current locale (LC_ALL || LC_CTYPE || LANG).
> The programs we are waiting for are:
> [...]

Add to that list many of the programming languages that use Unicode
internally but that do not yet set the default i/o encoding correctly
automatically based on LC_ALL || LC_CTYPE || LANG.

For example TCL currently uses some primitive LANG substring matching,
which basically gets only a few Japanese and Russian encodings right. The
TCL function unix/tclUnixInit.c:TclpSetInitialEncodings really should call
libcharset or nl_langinfo(CODESET) instead:


I suspect that Perl and Python are not much better and don't call
nl_langinfo(CODESET) or the portable libcharset wrapper around it either
to properly determine the locale-dependent external encoding.

References on how to determine the character encoding from the locale in a
safe and portable manner:



Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>