[I18n-sig] Re: [Python-Dev] Unicode debate

Moshe Zadka Moshe Zadka <moshez@math.huji.ac.il>
Tue, 2 May 2000 13:12:14 +0300 (IDT)


On Mon, 1 May 2000, Guido van Rossum wrote:

> Paul, we're both just saying the same thing over and over without
> convincing each other.  I'll wait till someone who wasn't in this
> debate before chimes in.

Well, I'm guessing you had someone specific in mind (Neil?), but I want to
say someothing too, as the only one here (I think) using ISO-8859-8
natively. I much prefer the Fredrik-Paul position, known also as the
character is a character position, to the UTF-8 as default encoding.
Unicode is western-centered -- the first 256 characters are Latin 1. UTF-8
is even more horribly western-centered (or I should say USA centered) --
ASCII documents are the same. I'd much prefer Python to reflect a
fundamental truth about Unicode, which at least makes sure binary-goop can
pass through Unicode and remain unharmed, then to reflect a nasty problem
with UTF-8 (not everything is legal). 

If I'm using Hebrew characters in my source (which I won't for a long
while), I'll use them in  Unicode strings only, and make sure I use
Unicode. If I'm reading Hebrew from an IS-8859-8 file, I'll set a
conversion to Unicode on the fly anyway, since most bidi libraries work on
Unicode. So having UTF-8 conversions magically happen won't help me at
all, and will only cause problem when I use "sort-for-uniqueness" on a
list with mixed binary-goop and Unicode strings. In short, this sounds
like a recipe for disaster.

internationally y'rs, Z.

--
Moshe Zadka <moshez@math.huji.ac.il>
http://www.oreilly.com/news/prescod_0300.html
http://www.linux.org.il -- we put the penguin in .com