[I18n-sig] Re: Unicode debate

Just van Rossum just@letterror.com
Sat, 29 Apr 2000 18:40:22 +0100


At 3:25 PM +0200 29-04-2000, M.-A. Lemburg wrote:
>Just van Rossum wrote:
>>
>> At 9:51 PM +0200 28-04-2000, M.-A. Lemburg wrote:
>> >Right. Binary data in such a string literal would have to
>> >use str('...data...','binary') to get the correct encoding
>> >attached to it.
>>
>> And that sucks.
>
>Not sure why... after all the point of adding encoding information
>to strings was to add missing information: the current usage
>as binary data container would then be justified provided the
>strings are marked as containing binary data.

For one, it's just too much hassle to write str('...data...','binary')...

All my proposal was, was a very lightweight way to ensure correct
translation to unicode when needed. What you seem to suggest, is that the
encoding attribute could be used to make 8-bit strings almost as powerful
as unicode strings, by converting to unicode whenever there's an action
that involves two 8-bit strings with different encodings. While I'm sure
that would have it's uses, I think that's too ambitious, and seems to get
too much in the way of 8-bit strings doubling as byte arrays. As I've
admitted before, what I had in mind for the encoding attribute is probably
to weak a use to warrant the effort, and there are indeed too many things
that can still go wrong. So for now I'll let it go... (But it was fun
indeed ;-)

(Oh, and I still stand by my and Fredrik's point that utf-8 is a poor
default choice when coercing 8-bit strings to unicode, for the sole reason
a utf-8 string is a byte array, and not a character string.)

Just