[Python-Dev] Re: adding a bytes sequence type to Python

Tue Aug 17 23:59:19 CEST 2004

On Aug 17, 2004, at 5:33 PM, Guido van Rossum wrote:

>> So, how will it be different from:
>>
>>      from array import array
>>
>>      def bytes(*initializer):
>>          return array('B',*initializer)
>>
>> Even if it's desirable for 'bytes' to be an actual type (e.g. 
>> subclassing
>> ArrayType), it might help the definition process to describe the 
>> difference
>> between the new type and a byte array.
>
> Not a whole lot different, except for the ability to use a string as
> alternate argument to the constructor, and the fact that it's going to
> be an actual type, and that it should support the buffer API (which
> array mysteriously doesn't?).
>
> The string argument support may not even be necessary -- an
> alternative way to spell that would be to let s.decode() return a
> bytes object, which has the advantage of being explicit about the
> encoding; there's even a base64 encoding already!  But it would be a
> bigger incompatibility, more likely to break existing code using
> decode() and expecting to get a string.

IMHO current uses of decode and encode are really confusing.  Many 
decodes are from str -> unicode, and many encodes are from unicode -> 
str (or str -> unicode -> str implicitly, which is usually going to 
fail miserably)... while yet others like zlib, base64, etc. are str <-> 
str.  Technically unicode.decode(base64) should certainly work, but it 
doesn't because unicode doesn't have a decode method.

I don't have a proposed solution at the moment, but perhaps these 
operations should either be outside of the data types altogether (i.e. 
use codecs only) or there should be separate methods for doing separate 
things (character translations versus data->data transformations).

-bob