[Python-Dev] Internationalization Toolkit
Thu, 11 Nov 1999 18:31:34 +0100
Andy Robinson wrote:
> > See my other post on the subject...
> > Note that if we make UTF-8 the standard encoding,
> > nearly all
> > special Latin-1 characters will produce UTF-8 errors
> > on input
> > and unreadable garbage on output. That will probably
> > be unacceptable
> > in Europe. To remedy this, one would *always* have
> > to use
> > u.encode('latin-1') to get readable output for
> > Latin-1 strings
> > repesented in Unicode.
> You beat me to it - a colleague and I were just
> discussing this verbally. Specifically we Brits will
> get annoyed as soon as we read in a text file with
> pound (sterling) signs.
> We concluded that the only reasonable default (if you
> have one at all) is pure ASCII. At least that way I
> will get a clear and intelligible warning when I load
> in such a file, and will remember to specify
Well, Guido's post made me rethink the approach...
1. Setting <default encoding> to any non UTF encoding
will result in data lossage due to the encoding limits
imposed by the other formats -- this is dangerous and
will result in errors (some of which may not even be
noticed due to the interpreter ignoring them) in case
your strings use non encodable characters.
2. You basically only want to set <default encoding> to
anything other than UTF-8 for stream input and output.
This can be done using the unicodec stream wrapper without
too much inconvenience. (We'll have to extend the wrapper a little,
though, because it currently only accept Unicode objects for
writing and always return Unicode object when reading.)
3. We should leave the issue open until some code is there
to be tested... I have a feeling that there will be quite
a few strange effects when APIs expecting strings are fed
with Unicode objects returning UTF-8.
Y2000: 50 days left
Python Pages: http://www.lemburg.com/python/