[Python-Dev] bytes type discussion
bob at redivi.com
Wed Feb 15 01:56:00 CET 2006
On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:
> On 2/14/06, Bob Ippolito <bob at redivi.com> wrote:
>> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
>>> - we need a new PEP; PEP 332 won't cut it
>>> - no b"..." literal
>>> - bytes objects are mutable
>>> - bytes objects are composed of ints in range(256)
>>> - you can pass any iterable of ints to the bytes constructor, as
>>> as they are in range(256)
>> Sounds like array.array('B').
>> Will the bytes object support the buffer interface?
> Do you want them to?
> I suppose they should *not* support the *text* part of that API.
I would imagine that it'd be convenient for integrating with existing
extensions... e.g. initializing an array or Numeric array with one.
>> Will it accept
>> objects supporting the buffer interface in the constructor (or a
>> class method)? If so, will it be a copy or a view? Current
>> array.array behavior says copy.
> bytes() should always copy -- thanks for asking.
I only really ask because it's worth fully specifying these things.
Copy seems a lot more sensible given the rest of the interpreter and
stdlib (e.g. buffer(x) seems to always return a read-only buffer).
>>> - longs or anything with an __index__ method should do, too
>>> - when you index a bytes object, you get a plain int
>> When slicing a bytes object, do you get another bytes object or a
>> list? If its a bytes object, is it a copy or a view? Current
>> array.array behavior says copy.
> Another bytes object which is a copy.
> (Why would you even think about views here? They are evil.)
I mention views because that's what numpy/Numeric/numarray/etc.
do... It's certainly convenient at times to have that functionality,
for example, to work with only the alpha channel in an RGBA image.
Probably too magical for the bytes type.
>>> import numpy
>>> image = numpy.array(list('RGBARGBARGBA'))
>>> alpha = image[3::4]
array([A, A, A], dtype=(string,1))
>>> alpha[:] = 'X'
array([R, G, B, X, R, G, B, X, R, G, B, X], dtype=(string,1))
>>> Very controversial:
>>> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
>>> - bytes(u"abc") == bytes("abc") # for ASCII at least
>>> - bytes(u"\x80\xff") raises UnicodeError
>>> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
>>> Martin von Loewis's alternative for the "very controversial" set
>>> is to
>>> disallow an encoding argument and (I believe) also to disallow
>>> arguments. In 3.0 this would leave us with s.encode(<encoding>)
>>> as the
>>> only way to convert a string (which is always unicode) to bytes. The
>>> problem with this is that there's no code that works in both 2.x and
>> Given a base64 or hex string, how do you get a bytes object out of
>> it? Currently str.decode('base64') and str.decode('hex') are good
>> solutions to this... but you get a str object back.
> I don't know -- you can propose an API you like here. base64 is as
> likely to encode text as binary data, so I don't think it's wrong for
> those things to return strings.
That's kinda true I guess -- but you'd still need an encoding in py3k
to turn base64 -> text. A lot of the current codecs infrastructure
doesn't make sense in py3k -- for example, the 'zlib' encoding, which
is really a bytes transform, or 'unicode_escape' which is a text
I suppose there aren't too many different ways you'd want to encode
or decode data to binary (beyond the text codecs), they should
probably just live in a module -- something like the binascii we have
now. I do find the codecs infrastructure to be convenient at times
(maybe too convenient), but since you're not interested in adding
functions to existing types then a module seems like the best approach.
More information about the Python-Dev