[Python-Dev] fun with unicode, part 1
Fredrik Lundh
Fredrik Lundh" <effbot@telia.com
Tue, 2 May 2000 09:55:49 +0200
Tim Peters wrote:
> [Guido asks good questions about how Windows deals w/ Unicode =
filenames,
> last Thursday, but gets no answers]
you missed Finn Bock's post on how Java does it.
here's another data point:
Tcl uses a system encoding to convert from unicode to a suitable
system API encoding, and uses the following approach to figure out
what that one is:
windows NT/2000:
unicode (use wide api)
windows 95/98:
"cp%d" % GetACP()
(note that this is "cp1252" in us and western europe,
not "iso-8859-1")
=20
macintosh:
determine encoding for fontId 0 based on (script,
smScriptLanguage) tuple. if that fails, assume
"macroman"
unix:
figure out the locale from LC_ALL, LC_CTYPE, or LANG.
use heuristics to map from the locale to an encoding
(see unix/tclUnixInit). if that fails, assume "iso-8859-1"
I propose adding a similar mechanism to Python, along these lines:
sys.getdefaultencoding() returns the right thing for windows
and macintosh, "iso-8859-1" for other platforms.
sys.setencoding(codec) changes the system encoding. it's
used from site.py to set things up properly on unix and other
non-unicode platforms.
</F>