On Wed, 3 May 2000, Tim Peters wrote: [Moshe Zadka]
... I'd much prefer Python to reflect a fundamental truth about Unicode, which at least makes sure binary-goop can pass through Unicode and remain unharmed, then to reflect a nasty problem with UTF-8 (not everything is legal).
[Tim Peters]
Then you don't want Unicode at all, Moshe. All the official encoding schemes for Unicode 3.0 suffer illegal byte sequences
Of course I don't, and of course you're right. But what I do want is for my binary goop to pass unharmed through the evil Unicode forest. Which is why I don't want it to interpret my goop as a sequence of bytes it tries to decode, but I want the numeric values of my bytes to pass through to Unicode uharmed -- that means Latin-1 because of the second design decision of the horribly western-specific unicdoe - the first 256 characters are the same as Latin-1. If it were up to me, I'd use Latin-3, but it wasn't, so it's not.
(for example, 0xffff is illegal in UTF-16 (whether BE or LE)
Tim, one of us must have cracked a chip. 0xffff is the same in BE and LE -- isn't it. -- Moshe Zadka <moshez@math.huji.ac.il> http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com