[Python-Dev] Internationalization Toolkit

M.-A. Lemburg mal@lemburg.com
Thu, 11 Nov 1999 18:31:34 +0100

Andy Robinson wrote:
> > See my other post on the subject...
> >
> > Note that if we make UTF-8 the standard encoding,
> > nearly all
> > special Latin-1 characters will produce UTF-8 errors
> > on input
> > and unreadable garbage on output. That will probably
> > be unacceptable
> > in Europe. To remedy this, one would *always* have
> > to use
> > u.encode('latin-1') to get readable output for
> > Latin-1 strings
> > repesented in Unicode.
> You beat me to it - a colleague and I were just
> discussing this verbally.  Specifically we Brits will
> get annoyed as soon as we read in a text file with
> pound (sterling) signs.
> We concluded that the only reasonable default (if you
> have one at all) is pure ASCII.  At least that way I
> will get a clear and intelligible warning when I load
> in such a file, and will remember to specify
> ISO-Latin-1.

Well, Guido's post made me rethink the approach...

1. Setting <default encoding> to any non UTF encoding
   will result in data lossage due to the encoding limits
   imposed by the other formats -- this is dangerous and
   will result in errors (some of which may not even be
   noticed due to the interpreter ignoring them) in case
   your strings use non encodable characters.

2. You basically only want to set <default encoding> to
   anything other than UTF-8 for stream input and output.
   This can be done using the unicodec stream wrapper without
   too much inconvenience. (We'll have to extend the wrapper a little,
   though, because it currently only accept Unicode objects for
   writing and always return Unicode object when reading.)

3. We should leave the issue open until some code is there
   to be tested... I have a feeling that there will be quite
   a few strange effects when APIs expecting strings are fed
   with Unicode objects returning UTF-8.

Marc-Andre Lemburg
Y2000:                                                    50 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/