[Python-Dev] bytes type discussion
Guido van Rossum
guido at python.org
Wed Feb 15 01:17:11 CET 2006
On 2/14/06, Bob Ippolito <bob at redivi.com> wrote:
> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
> > - we need a new PEP; PEP 332 won't cut it
> >
> > - no b"..." literal
> >
> > - bytes objects are mutable
> >
> > - bytes objects are composed of ints in range(256)
> >
> > - you can pass any iterable of ints to the bytes constructor, as long
> > as they are in range(256)
>
> Sounds like array.array('B').
Sure.
> Will the bytes object support the buffer interface?
Do you want them to?
I suppose they should *not* support the *text* part of that API.
> Will it accept
> objects supporting the buffer interface in the constructor (or a
> class method)? If so, will it be a copy or a view? Current
> array.array behavior says copy.
bytes() should always copy -- thanks for asking.
> > - longs or anything with an __index__ method should do, too
> >
> > - when you index a bytes object, you get a plain int
>
> When slicing a bytes object, do you get another bytes object or a
> list? If its a bytes object, is it a copy or a view? Current
> array.array behavior says copy.
Another bytes object which is a copy.
(Why would you even think about views here? They are evil.)
> > - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
> >
> > Somewhat controversial:
> >
> > - it's probably too big to attempt to rush this into 2.5
> >
> > - bytes("abc") == bytes(map(ord, "abc"))
> >
> > - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,
> > 256])
>
> It would be VERY controversial if ord('\xff') == 256 ;)
Oops. :-)
> > Very controversial:
> >
> > - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
> > argument
> >
> > - bytes(u"abc") == bytes("abc") # for ASCII at least
> >
> > - bytes(u"\x80\xff") raises UnicodeError
> >
> > - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
> >
> > Martin von Loewis's alternative for the "very controversial" set is to
> > disallow an encoding argument and (I believe) also to disallow Unicode
> > arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
> > only way to convert a string (which is always unicode) to bytes. The
> > problem with this is that there's no code that works in both 2.x and
> > 3.0.
>
> Given a base64 or hex string, how do you get a bytes object out of
> it? Currently str.decode('base64') and str.decode('hex') are good
> solutions to this... but you get a str object back.
I don't know -- you can propose an API you like here. base64 is as
likely to encode text as binary data, so I don't think it's wrong for
those things to return strings.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list