[Python-Dev] Re: [I18n-sig] Re: Unicode debate

Just van Rossum just@letterror.com
Tue, 2 May 2000 14:55:31 +0100


At 8:31 AM -0400 02-05-2000, Guido van Rossum wrote:
>When *comparing* 8-bit and Unicode strings, the presence of non-ASCII
>bytes in either should make the comparison fail; when ordering is
>important, we can make an arbitrary choice e.g. "\377" < u"\200".

Blech. Just document 8-bit strings *are* Latin-1 unless converted
explicitly, and you're done. It's really much simpler this way. For you as
well as the users.

>Why not Latin-1?  Because it gives us Western-alphabet users a false
>sense that our code works, where in fact it is broken as soon as you
>change the encoding.

Yeah, and? It least it'll *show* it's broken instead of *silently* doing
the wrong thing with utf-8.

It's like using Python ints all over the place, and suddenly a user of the
application enters data that causes an integer overflow. Boom. Program
needs to be fixed. What's the big deal?

Just