[Python-Dev] Re: adding a bytes sequence type to Python

James Y Knight foom at fuhm.net
Wed Aug 18 00:04:45 CEST 2004


On Aug 17, 2004, at 5:18 PM, Bob Ippolito wrote:

> On Aug 17, 2004, at 5:11 PM, Martin v. Löwis wrote:
>
>> Bob Ippolito wrote:
>>> How would you embed raw bytes if the string was unicode?
>>
>> The most direct notation would be
>>
>>    bytes("delimited packet\x00")
>>
>> However, people might not understand what is happening, and
>> Guido doesn't like it if the bytes are >127.
>
> I guess that was a bad example, what if the delimiter was \xff?

Indeed, if all strings are unicode, the question becomes: what encoding 
does bytes() use to translate unicode characters to bytes. Two 
alternatives have been proposed so far:
1) ASCII (translate chars as their codepoint if  < 128, else error)
2) ISO-8859-1 (translate chars as their codepoint if < 256, else error)

I think I'd choose #2, myself.

> I know that map(ord, u'delimited packet\xff') would get correct 
> results.. but I don't think I like that either.

Why would you consider that wrong? ord(u'\xff') *should* return 255. 
Just as ord(u'\u1000') returns 4096. There's nothing mysterious there.

James


More information about the Python-Dev mailing list