[Python-Dev] PEP 467: Minor API improvements for bytes & bytearray

Fri Aug 15 19:48:58 CEST 2014

This feels chatty. I'd like the PEP to call out the specific proposals and
put the more verbose motivation later. It took me a long time to realize
that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3). Also
your mention of bytes.byte() as the counterpart to ord() confused me -- I
think it's more similar to chr(). I don't like iterbytes as a builtin,
let's keep it as a method on affected types.

On Thu, Aug 14, 2014 at 10:50 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> I just posted an updated version of PEP 467 after recently finishing
> the updates to the Python 3.4+ binary sequence docs to decouple them
> from the str docs.
>
> Key points in the proposal:
>
> * deprecate passing integers to bytes() and bytearray()
> * add bytes.zeros() and bytearray.zeros() as a replacement
> * add bytes.byte() and bytearray.byte() as counterparts to ord() for
> binary data
> * add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes()
>
> As far as I am aware, that last item poses the only open question,
> with the alternative being to add an "iterbytes" builtin with a
> definition along the lines of the following:
>
>     def iterbytes(data):
>         try:
>             getiter = type(data).__iterbytes__
>         except AttributeError:
>             iter = map(bytes.byte, data)
>         else:
>             iter = getiter(data)
>         return iter
>
> Regards,
> Nick.
>
> PEP URL: http://www.python.org/dev/peps/pep-0467/
>
> Full PEP text:
> =============================
> PEP: 467
> Title: Minor API improvements for bytes and bytearray
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nick Coghlan <ncoghlan at gmail.com>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-03-30
> Python-Version: 3.5
> Post-History: 2014-03-30 2014-08-15
>
>
> Abstract
> ========
>
> During the initial development of the Python 3 language specification, the
> core ``bytes`` type for arbitrary binary data started as the mutable type
> that is now referred to as ``bytearray``. Other aspects of operating in
> the binary domain in Python have also evolved over the course of the Python
> 3 series.
>
> This PEP proposes a number of small adjustments to the APIs of the
> ``bytes``
> and ``bytearray`` types to make it easier to operate entirely in the binary
> domain.
>
>
> Background
> ==========
>
> To simplify the task of writing the Python 3 documentation, the ``bytes``
> and ``bytearray`` types were documented primarily in terms of the way they
> differed from the Unicode based Python 3 ``str`` type. Even when I
> `heavily revised the sequence documentation
> <http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained
> that
> simplifying shortcut.
>
> However, it turns out that this approach to the documentation of these
> types
> had a problem: it doesn't adequately introduce users to their hybrid
> nature,
> where they can be manipulated *either* as a "sequence of integers" type,
> *or* as ``str``-like types that assume ASCII compatible data.
>
> That oversight has now been corrected, with the binary sequence types now
> being documented entirely independently of the ``str`` documentation in
> `Python 3.4+ <
> https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview
> >`__
>
> The confusion isn't just a documentation issue, however, as there are also
> some lingering design quirks from an earlier pre-release design where there
> was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
> was mutable (with no immutable counterpart).
>
> Finally, additional experience with using the existing Python 3 binary
> sequence types in real world applications has suggested it would be
> beneficial to make it easier to convert integers to length 1 bytes objects.
>
>
> Proposals
> =========
>
> As a "consistency improvement" proposal, this PEP is actually about a few
> smaller micro-proposals, each aimed at improving the usability of the
> binary
> data model in Python 3. Proposals are motivated by one of two main factors:
>
> * removing remnants of the original design of ``bytes`` as a mutable type
> * allowing users to easily convert integer values to a length 1 ``bytes``
>   object
>
>
> Alternate Constructors
> ----------------------
>
> The ``bytes`` and ``bytearray`` constructors currently accept an integer
> argument, but interpret it to mean a zero-filled object of the given
> length.
> This is a legacy of the original design of ``bytes`` as a mutable type,
> rather than a particularly intuitive behaviour for users. It has become
> especially confusing now that some other ``bytes`` interfaces treat
> integers
> and the corresponding length 1 bytes instances as equivalent input.
> Compare::
>
>     >>> b"\x03" in bytes([1, 2, 3])
>     True
>     >>> 3 in bytes([1, 2, 3])
>     True
>
>     >>> bytes(b"\x03")
>     b'\x03'
>     >>> bytes(3)
>     b'\x00\x00\x00'
>
> This PEP proposes that the current handling of integers in the bytes and
> bytearray constructors by deprecated in Python 3.5 and targeted for
> removal in Python 3.7, being replaced by two more explicit alternate
> constructors provided as class methods. The initial python-ideas thread
> [ideas-thread1]_ that spawned this PEP was specifically aimed at
> deprecating
> this constructor behaviour.
>
> Firstly, a ``byte`` constructor is proposed that converts integers
> in the range 0 to 255 (inclusive) to a ``bytes`` object::
>
>     >>> bytes.byte(3)
>     b'\x03'
>     >>> bytearray.byte(3)
>     bytearray(b'\x03')
>     >>> bytes.byte(512)
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     ValueError: bytes must be in range(0, 256)
>
> One specific use case for this alternate constructor is to easily convert
> the result of indexing operations on ``bytes`` and other binary sequences
> from an integer to a ``bytes`` object. The documentation for this API
> should note that its counterpart for the reverse conversion is ``ord()``.
> The ``ord()`` documentation will also be updated to note that while
> ``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and
> ``bytearray.byte`` are the counterparts for binary input.
>
> Secondly, a ``zeros`` constructor is proposed that serves as a direct
> replacement for the current constructor behaviour, rather than having to
> use
> sequence repetition to achieve the same effect in a less intuitive way::
>
>     >>> bytes.zeros(3)
>     b'\x00\x00\x00'
>     >>> bytearray.zeros(3)
>     bytearray(b'\x00\x00\x00')
>
> The chosen name here is taken from the corresponding initialisation
> function
> in NumPy (although, as these are sequence types rather than N-dimensional
> matrices, the constructors take a length as input rather than a shape
> tuple)
>
> While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more
> useful duo amongst the new constructors, ``bytes.zeros`` and
> `bytearray.byte`` are provided in order to maintain API consistency between
> the two types.
>
>
> Iteration
> ---------
>
> While iteration over ``bytes`` objects and other binary sequences produces
> integers, it is sometimes desirable to iterate over length 1 bytes objects
> instead.
>
> To handle this situation more obviously (and more efficiently) than would
> be
> the case with the ``map(bytes.byte, data)`` construct enabled by the above
> constructor changes, this PEP proposes the addition of a new ``iterbytes``
> method to ``bytes``, ``bytearray`` and ``memoryview``::
>
>     for x in data.iterbytes():
>         # x is a length 1 ``bytes`` object, rather than an integer
>
> Third party types and arbitrary containers of integers that lack the new
> method can still be handled by combining ``map`` with the new
> ``bytes.byte()`` alternate constructor proposed above::
>
>     for x in map(bytes.byte, data):
>         # x is a length 1 ``bytes`` object, rather than an integer
>         # This works with *any* container of integers in the range
>         # 0 to 255 inclusive
>
>
> Open questions
> ^^^^^^^^^^^^^^
>
> * The fallback case above suggests that this could perhaps be better
> handled
>   as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()``
>   if defined, but otherwise fell back to ``map(bytes.byte, data)``::
>
>     for x in iterbytes(data):
>         # x is a length 1 ``bytes`` object, rather than an integer
>         # This works with *any* container of integers in the range
>         # 0 to 255 inclusive
>
>
> References
> ==========
>
> .. [ideas-thread1]
> https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
> .. [empty-buffer-issue] http://bugs.python.org/issue20895
> .. [GvR-initial-feedback]
> https://mail.python.org/pipermail/python-ideas/2014-March/027376.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20140815/2d1f2f59/attachment.html>