[Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

Nick Coghlan ncoghlan at gmail.com
Tue Aug 19 14:25:48 CEST 2014

On 18 August 2014 10:45, Guido van Rossum <guido at python.org> wrote:
> On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw <barry at python.org> wrote:
>> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote:
>> >There's actually another aspect to your idea, independent of the naming:
>> >exposing a view rather than just an iterator. I'm going to have to look
>> > at
>> >the implications for memoryview, but it may be a good way to go (and
>> > would
>> >align with the iterator -> view changes in dict).
>> Yep!  Maybe that will inspire a better spelling. :)
> +1. It's just as much about b[i] as it is about "for c in b", so a view
> sounds right. (The view would have to be mutable for bytearrays and for
> writable memoryviews.)
> On the rest, it's sounding more and more as if we will just need to live
> with both bytes(1000) and bytearray(1000). A warning sounds worse than a
> deprecation to me.

I'm fine with keeping bytearray(1000), since that works the same way
in both Python 2 & 3, and doesn't seem likely to be invoked

I'd still like to deprecate "bytes(1000)", since that does different
things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in

$ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
$ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'

Hitting the deprecation warning in single-source code would seem to be
a strong hint that you have a bug in one version or the other rather
than being intended behaviour.

> bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes and
> bytearray pretty highly.

With "bytearray(1000)" sticking around indefinitely, I'm less
concerned about adding a "zeros" constructor.

> I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a size
> one list is (or at least feels) more expensive to allocate than a size one
> bytes object. So, okay.

So, here's an interesting thing I hadn't previously registered: we
actually already have a fairly capable "bytesview" option, and have
done since Stefan implemented "memoryview.cast" in 3.3. The trick lies
in the 'c' format character for the struct module, which is parsed as
a length 1 bytes object rather than as an integer:

>>> data = bytearray(b"Hello world")
>>> bytesview = memoryview(data).cast('c')
>>> list(bytesview)
[b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'Hello world'
>>> bytesview[0:5] = memoryview(b"olleH").cast('c')
>>> list(bytesview)
[b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd']
>>> b''.join(bytesview)
b'olleH world'

For the read-only case, it covers everything (iteration, indexing,
slicing), for the writable view case, it doesn't cover changing the
shape of the target array, and it doesn't cover assigning arbitrary
buffer objects (you need to wrap them in a similar cast for memoryview
to allow the assignment).

It's hardly the most *intuitive* spelling though - I was one of the
reviewers for Stefan's memoryview rewrite back in 3.3, and I only made
the connection today when looking to see how a view object like the
one we were discussing elsewhere in the thread might be implemented as
a facade over arbitrary memory buffers, rather than being specific to
bytes and bytearray.

If we went down the "bytesview" path, then a single new facade would
cover not only the 3 builtins (bytes, bytearray, memoryview) but also
any *other* buffer exporting type. If we so chose (at some point in
the future, not as part of this PEP), such a type could allow
additional bytes operations (like "count", "startswith" or "index") to
be applied to arbitrary regions of memory without making a copy. We
can't add those other operations to memoryview, since they don't make
sense for an n-dimensional array.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-Dev mailing list