[Python-Dev] Re: [I18n-sig] Re: Unicode debate
Just van Rossum
just@letterror.com
Tue, 2 May 2000 16:42:24 +0100
>[Just]
>> You're going to have a hard time explaining that "\377" != u"\377".
>
[GvR]
>I agree. You are an example of how hard it is to explain: you still
>don't understand that for a person using CJK encodings this is in fact
>the truth.
That depends on the definition of truth: it you document that 8-bit strings
are Latin-1, the above is the truth. Conceptually classify all other 8-bit
encodings as binary goop makes the semantics chrystal clear.
>> Again, if you define that "all strings are unicode" and that 8-bit strings
>> contain Unicode characters up to 255, you're all set. Clear semantics, few
>> surprises, simple implementation, etc. etc.
>
>But not all 8-bit strings occurring in programs are Unicode. Ask
>Moshe.
I know. They can be anything, even binary goop. But that's *only* an
artifact of the fact that 8-bit strings need to double as buffer objects.
Just