[Python-Dev] Fwd: PEP 467: Minor API improvements for bytes & bytearray

Guido van Rossum guido at python.org
Tue Aug 19 18:46:24 CEST 2014


On Tue, Aug 19, 2014 at 5:25 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 18 August 2014 10:45, Guido van Rossum <guido at python.org> wrote:
> > On Sun, Aug 17, 2014 at 5:22 PM, Barry Warsaw <barry at python.org> wrote:
> >>
> >> On Aug 18, 2014, at 10:08 AM, Nick Coghlan wrote:
> >>
> >> >There's actually another aspect to your idea, independent of the
> naming:
> >> >exposing a view rather than just an iterator. I'm going to have to look
> >> > at
> >> >the implications for memoryview, but it may be a good way to go (and
> >> > would
> >> >align with the iterator -> view changes in dict).
> >>
> >> Yep!  Maybe that will inspire a better spelling. :)
> >
> >
> > +1. It's just as much about b[i] as it is about "for c in b", so a view
> > sounds right. (The view would have to be mutable for bytearrays and for
> > writable memoryviews.)
> >
> > On the rest, it's sounding more and more as if we will just need to live
> > with both bytes(1000) and bytearray(1000). A warning sounds worse than a
> > deprecation to me.
>
> I'm fine with keeping bytearray(1000), since that works the same way
> in both Python 2 & 3, and doesn't seem likely to be invoked
> inadvertently.
>
> I'd still like to deprecate "bytes(1000)", since that does different
> things in Python 2 & 3, while "b'\x00' * 1000" does the same thing in
> both.
>

I think any argument based on what "bytes" does in Python 2 is pretty weak,
since Python 2's bytes is just an alias for str, so it has tons of behavior
that differ -- why single this out?

In Python 3, I really like bytes and bytearray to be as similar as
possible, and that includes the constructor.


> $ python -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
> '10'
> '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> $ python3 -c 'print("{!r}\n{!r}".format(bytes(10), b"\x00" * 10))'
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
> b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>
> Hitting the deprecation warning in single-source code would seem to be
> a strong hint that you have a bug in one version or the other rather
> than being intended behaviour.
>
> > bytes.zeros(n) sounds fine to me; I value similar interfaces for bytes
> and
> > bytearray pretty highly.
>
> With "bytearray(1000)" sticking around indefinitely, I'm less
> concerned about adding a "zeros" constructor.
>

That's fine.


>  > I'm lukewarm on bytes.byte(c); but bytes([c]) does bother me because a
> size
> > one list is (or at least feels) more expensive to allocate than a size
> one
> > bytes object. So, okay.
>
> So, here's an interesting thing I hadn't previously registered: we
> actually already have a fairly capable "bytesview" option, and have
> done since Stefan implemented "memoryview.cast" in 3.3. The trick lies
> in the 'c' format character for the struct module, which is parsed as
> a length 1 bytes object rather than as an integer:
>
> >>> data = bytearray(b"Hello world")
> >>> bytesview = memoryview(data).cast('c')
> >>> list(bytesview)
> [b'H', b'e', b'l', b'l', b'o', b' ', b'w', b'o', b'r', b'l', b'd']
> >>> b''.join(bytesview)
> b'Hello world'
> >>> bytesview[0:5] = memoryview(b"olleH").cast('c')
> >>> list(bytesview)
> [b'o', b'l', b'l', b'e', b'H', b' ', b'w', b'o', b'r', b'l', b'd']
> >>> b''.join(bytesview)
> b'olleH world'
>
> For the read-only case, it covers everything (iteration, indexing,
> slicing), for the writable view case, it doesn't cover changing the
> shape of the target array, and it doesn't cover assigning arbitrary
> buffer objects (you need to wrap them in a similar cast for memoryview
> to allow the assignment).
>
> It's hardly the most *intuitive* spelling though - I was one of the
> reviewers for Stefan's memoryview rewrite back in 3.3, and I only made
> the connection today when looking to see how a view object like the
> one we were discussing elsewhere in the thread might be implemented as
> a facade over arbitrary memory buffers, rather than being specific to
> bytes and bytearray.
>

Maybe the 'future' package can offer an iterbytes or bytesview implemented
this way?


> If we went down the "bytesview" path, then a single new facade would
> cover not only the 3 builtins (bytes, bytearray, memoryview) but also
> any *other* buffer exporting type. If we so chose (at some point in
> the future, not as part of this PEP), such a type could allow
> additional bytes operations (like "count", "startswith" or "index") to
> be applied to arbitrary regions of memory without making a copy.


Why call out "without making a copy" for operations that naturally don't
have to copy anything?


> We
> can't add those other operations to memoryview, since they don't make
> sense for an n-dimensional array.
>

I'm sorry for your efforts, but I'm getting more and more lukewarm about
the entire PEP.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140819/55a1c4be/attachment.html>


More information about the Python-Dev mailing list