On Mon, 1 May 2000, Guido van Rossum wrote:
Paul, we're both just saying the same thing over and over without convincing each other. I'll wait till someone who wasn't in this debate before chimes in.
Well, I'm guessing you had someone specific in mind (Neil?), but I want to
say someothing too, as the only one here (I think) using ISO-8859-8
natively. I much prefer the Fredrik-Paul position, known also as the
character is a character position, to the UTF-8 as default encoding.
Unicode is western-centered -- the first 256 characters are Latin 1. UTF-8
is even more horribly western-centered (or I should say USA centered) --
ASCII documents are the same. I'd much prefer Python to reflect a
fundamental truth about Unicode, which at least makes sure binary-goop can
pass through Unicode and remain unharmed, then to reflect a nasty problem
with UTF-8 (not everything is legal).
If I'm using Hebrew characters in my source (which I won't for a long
while), I'll use them in Unicode strings only, and make sure I use
Unicode. If I'm reading Hebrew from an IS-8859-8 file, I'll set a
conversion to Unicode on the fly anyway, since most bidi libraries work on
Unicode. So having UTF-8 conversions magically happen won't help me at
all, and will only cause problem when I use "sort-for-uniqueness" on a
list with mixed binary-goop and Unicode strings. In short, this sounds
like a recipe for disaster.
internationally y'rs, Z.
--
Moshe Zadka