[I18n-sig] Determine encoding from $LANG
Markus Kuhn
mkuhn@suse.de
Thu, 28 Jun 2001 10:03:59 +0200 (CEST)
On Tue, 26 Jun 2001, Bruno Haible wrote:
>
> A program cannot be considered properly internationalized
> until it obeys the current locale (LC_ALL || LC_CTYPE || LANG).
>
> The programs we are waiting for are:
> [...]
Add to that list many of the programming languages that use Unicode
internally but that do not yet set the default i/o encoding correctly
automatically based on LC_ALL || LC_CTYPE || LANG.
For example TCL currently uses some primitive LANG substring matching,
which basically gets only a few Japanese and Russian encodings right. The
TCL function unix/tclUnixInit.c:TclpSetInitialEncodings really should call
libcharset or nl_langinfo(CODESET) instead:
https://sourceforge.net/tracker/?func=detail&aid=418645&group_id=10894&atid=110894
I suspect that Perl and Python are not much better and don't call
nl_langinfo(CODESET) or the portable libcharset wrapper around it either
to properly determine the locale-dependent external encoding.
References on how to determine the character encoding from the locale in a
safe and portable manner:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate
http://clisp.cons.org/~haible/packages-libcharset.html
http://www.opengroup.org/onlinepubs/7908799/xsh/langinfo.h.html
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>