[I18n-sig] Re: [Python-Dev] Unicode debate
Guido van Rossum
guido@python.org
Tue, 02 May 2000 10:15:50 -0400
[me]
> >Why not Latin-1? Because it gives us Western-alphabet users a false
> >sense that our code works, where in fact it is broken as soon as you
> >change the encoding.
[Just]
> Yeah, and? It least it'll *show* it's broken instead of *silently* doing
> the wrong thing with utf-8.
>
> It's like using Python ints all over the place, and suddenly a user of the
> application enters data that causes an integer overflow. Boom. Program
> needs to be fixed. What's the big deal?
The big deal is that in some cultures, 8-bit strings with non-ASCII
bytes are unlikely to be Latin-1. Under the Latin-1 convention, they
would get garbage when mixing Unicode and regular strings. This is
more like ingoring overflow on integer addition (so that 2000000000*2
yields -2442450944). I am against silently allowing erroneous results
like this if I can help it.
[Just, in a different message]
> Of course it's not, and of course you shouldn't be counting votes. However,
> the fact that more and more people chime in on the Latin-1 side (even
> non-western oriented people like Ping and Moshe!) should ring a bell.
Significantly, neither Ping nor Moshe cares for Latin-1 at all: they
don't have a use for a default encoding. This is because they have no
hope that their preferred encoding would be elected as the default
encoding.
Note that I think that the ASCII default encoding is essential --
ASCII is the character set used by the Python language for
identifiers, and any 8-bit source encoding should always be a superset
of ASCII. Essentially, Python has always made the (implicit)
guarantee that programs using only the ASCII character set are
portable w.r.t. character encodings -- I think this is important.
Having no default encoding would be like having no automatic coercion
between ints and long ints -- I tried this in very early Python
versions (around 0.9.1 I believe) but Tim Peters and/or Steve Majewski
quickly dissuaded me of this bad idea.
--Guido van Rossum (home page: http://www.python.org/~guido/)