[Python-Dev] Internationalization Toolkit

Fri, 12 Nov 1999 01:02:01 -0500

[MAL]
> Note that if we make UTF-8 the standard encoding, nearly all
> special Latin-1 characters will produce UTF-8 errors on input
> and unreadable garbage on output. That will probably be unacceptable
> in Europe. To remedy this, one would *always* have to use
> u.encode('latin-1') to get readable output for Latin-1 strings
> repesented in Unicode.

I think it's time for the Europeans to pronounce on what's acceptable in
Europe.  To the limited extent that I can pretend I'm Eurpoean, I'm happy
with Guido's rebind-stdin/stdout-in-PYTHONSTARTUP idea.

> I'd rather see this happen the other way around: *always* explicitly
> state the encoding you want in case you rely on it, e.g. write
>
> file.write(u.encode('utf-8'))
>
> instead of
>
> file.write(u) # let's hope this goes out as UTF-8...

By the same argument, those pesky Europeans who are relying on Latin-1
should write

file.write(u.encode('latin-1'))

instead of

file.write(u)  # let's hope this goes out as Latin-1

> Using the <default encoding> as site dependent setting is useful
> for convenience in those cases where the output format should be
> readable rather than parseable.

Well, "convenience" is always the argument advanced in favor of modes.
Conflicts and nasty intermittent bugs are always the result.  The latter
will happen under Guido's idea too, as various careless modules rebind stdin
& stdout to their own ideas of what "the proper" encoding should be.  But at
least the blame doesn't fall on the core language then <0.3 wink>.

Since there doesn't appear to be anything (either or good or bad) you can do
(or avoid) by using Guido's scheme instead of magical core thread state,
there's no *need* for the latter.  That is, it can be done with a user-level
API without involving the core.