On Mon, 1 May 2000, Guido van Rossum wrote:
Paul, we're both just saying the same thing over and over without convincing each other. I'll wait till someone who wasn't in this debate before chimes in.
Well, I'm guessing you had someone specific in mind (Neil?), but I want to say someothing too, as the only one here (I think) using ISO-8859-8 natively. I much prefer the Fredrik-Paul position, known also as the character is a character position, to the UTF-8 as default encoding. Unicode is western-centered -- the first 256 characters are Latin 1. UTF-8 is even more horribly western-centered (or I should say USA centered) -- ASCII documents are the same. I'd much prefer Python to reflect a fundamental truth about Unicode, which at least makes sure binary-goop can pass through Unicode and remain unharmed, then to reflect a nasty problem with UTF-8 (not everything is legal).
If I'm using Hebrew characters in my source (which I won't for a long while), I'll use them in Unicode strings only, and make sure I use Unicode. If I'm reading Hebrew from an IS-8859-8 file, I'll set a conversion to Unicode on the fly anyway, since most bidi libraries work on Unicode. So having UTF-8 conversions magically happen won't help me at all, and will only cause problem when I use "sort-for-uniqueness" on a list with mixed binary-goop and Unicode strings. In short, this sounds like a recipe for disaster.
internationally y'rs, Z.