Re: [I18n-sig] Re: [Python-Dev] Unicode debate

2 May 2000

      ...
It's the naive user who will be surprised by these random UTF-8 decoding
errors.
That's why this is NOT a convenience issue (are you listening MAL???).
It's a short and long term simplicity issue. There are lots of languages
where it is de rigeur to discover and work around inconvenient and
confusing default behaviors. I just don't think that we should be ADDING
such behaviors.
So what do you think of my new proposal of using ASCII as the default
"encoding"?  It takes care of "a character is a character" but also
(almost) guarantees an error message when mixing encoded 8-bit strings
with Unicode strings without specifying an explicit conversion --
*any* 8-bit byte with the top bit set is rejected by the default
conversion to Unicode.

I think this is less confusing than Latin-1: when an unsuspecting user
is reading encoded text from a file into 8-bit strings and attempts to
use it in a Unicode context, an error is raised instead of producing
garbage Unicode characters.

It encourages the use of Unicode strings for everything beyond ASCII
-- there's no way around ASCII since that's the source encoding etc.,
but Latin-1 is an inconvenient default in most parts of the world.
ASCII is accepted everywhere as the base character set (e.g. for
email and for text-based protocols like FTP and HTTP), just like
English is the one natural language that we can all sue to communicate
(to some extent).

--Guido van Rossum (home page: http://www.python.org/~guido/)