[Python-checkins] peps: PEP 467: bytes/bytearray API & docs improvements

Sun Mar 30 03:28:44 CEST 2014

http://hg.python.org/peps/rev/d7bcb861dcff
changeset:   5447:d7bcb861dcff
user:        Nick Coghlan <ncoghlan at gmail.com>
date:        Sun Mar 30 11:28:34 2014 +1000
summary:
  PEP 467: bytes/bytearray API & docs improvements

files:
  pep-0467.txt |  280 +++++++++++++++++++++++++++++++++++++++
  1 files changed, 280 insertions(+), 0 deletions(-)

diff --git a/pep-0467.txt b/pep-0467.txt
new file mode 100644
--- /dev/null
+++ b/pep-0467.txt
@@ -0,0 +1,280 @@
+PEP: 467
+Title: Improved API consistency for bytes and bytearray
+Version: $Revision$
+Last-Modified: $Date$
+Author: Nick Coghlan <ncoghlan at gmail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/x-rst
+Created: 2014-03-30
+Python-Version: 3.5
+Post-History: 2014-03-30
+
+
+Abstract
+========
+
+During the initial development of the Python 3 language specification, the
+core ``bytes`` type for arbitrary binary data started as the mutable type
+that is now referred to as ``bytearray``. Other aspects of operating in
+the binary domain in Python have also evolved over the course of the Python
+3 series.
+
+This PEP proposes a number of small adjustments to the APIs of the ``bytes``
+and ``bytearray`` types to make their behaviour more internally consistent
+and to make it easier to operate entirely in the binary domain for use cases
+that actually involve manipulating binary data directly, rather than
+converting it to a more structured form with additional modelling
+semantics (such as ``str``) and then converting back to binary format after
+processing.
+
+
+Background
+==========
+
+Over the course of Python 3's evolution, a number of adjustments have been
+made to the core ``bytes`` and ``bytearray`` types as additional practical
+experience was gained with using them in code beyond the Python 3 standard
+library and test suite. However, to date, these changes have been made
+on a relatively ad hoc tactical basis as specific issues were identified,
+rather than as part of a systematic review of the APIs of these types. This
+approach has allowed inconsistencies to creep into the API design as to which
+input types are accepted by different methods. Additional inconsistencies
+linger from an earlier pre-release design where there was *no* separate
+``bytearray`` type, and instead the core ``bytes`` type was mutable (with
+no immutable counterpart), as well as from the origins of these types in
+the text-like behaviour of the Python 2 ``str`` type.
+
+This PEP aims to provide the missing systematic review, with the goal of
+ensuring that wherever feasible (given backwards compatibility constraints)
+these current inconsistencies are addressed for the Python 3.5 release.
+
+
+Proposals
+=========
+
+As a "consistency improvement" proposal, this PEP is actually about a number
+of smaller micro-proposals, each aimed at improving the self-consistency of
+the binary data model in Python 3. Proposals are motivated by one of three
+factors:
+
+* removing remnants of the original design of ``bytes`` as a mutable type
+* more consistently accepting length 1 ``bytes`` objects as input where an
+  integer between ``0`` and ``255`` inclusive is expected, and vice-versa
+* allowing users to easily convert integer output to a length 1 ``bytes``
+  object
+
+
+Alternate Constructors
+----------------------
+
+The ``bytes`` and ``bytearray`` constructors currently accept an integer
+argument, but interpret it to mean a zero-filled object of the given length.
+This is a legacy of the original design of ``bytes`` as a mutable type,
+rather than a particularly intuitive behaviour for users. It has become
+especially confusing now that other ``bytes`` interfaces treat integers
+and the corresponding length 1 bytes instances as equivalent input.
+Compare::
+
+    >>> b"\x03" in bytes([1, 2, 3])
+    True
+    >>> 3 in bytes([1, 2, 3])
+    True
+
+    >>> bytes(b"\x03")
+    b'\x03'
+    >>> bytes(3)
+    b'\x00\x00\x00'
+
+This PEP proposes that the current handling of integers in the bytes and
+bytearray constructors by deprecated in Python 3.5 and removed in Python
+3.6, being replaced by two more type appropriate alternate constructors
+provided as class methods. The initial python-ideas thread [ideas-thread1]_
+that spawned this PEP was specifically aimed at deprecating this constructor
+behaviour.
+
+For ``bytes``, a ``byte`` constructor is proposed that converts integers
+(as indicated by ``operator.index``) in the appropriate range to a ``bytes``
+object, converts objects that support the buffer API to bytes, and also
+passes through length 1 byte strings unchanged::
+
+    >>> bytes.byte(3)
+    b'\x03'
+    >>> bytes.byte(bytearray(bytes([3])))
+    b'\x03'
+    >>> bytes.byte(memoryview(bytes([3])))
+    b'\x03'
+    >>> bytes.byte(bytes([3]))
+    b'\x03'
+    >>> bytes.byte(512)
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    ValueError: bytes must be in range(0, 256)
+    >>> bytes.byte(b"ab")
+    Traceback (most recent call last):
+      File "<stdin>", line 1, in <module>
+    TypeError: bytes.byte() expected a byte, but buffer of length 2 found
+
+One specific use case for this alternate constructor is to easily convert
+the result of indexing operations on ``bytes`` and other binary sequences
+from an integer to a ``bytes`` object. The documentation for this API
+should note that its counterpart for the reverse conversion is ``ord()``.
+
+For ``bytearray``, a ``from_len`` constructor is proposed that preallocates
+the buffer filled with a particular value (default to ``0``) as a direct
+replacement for the current constructor behaviour, rather than having to use
+sequence repetition to achieve the same effect in a less intuitive way::
+
+    >>> bytearray.from_len(3)
+    bytearray(b'\x00\x00\x00')
+    >>> bytearray.from_len(3, 6)
+    bytearray(b'\x06\x06\x06')
+
+This part of the proposal was covered by an existing issue
+[empty-buffer-issue]_ and a variety of names have been proposed
+(``empty_buffer``, ``zeros``, ``zeroes``, ``allnull``, ``fill``). The
+specific name currently proposed was chosen by analogy with
+``dict.fromkeys()`` and ``itertools.chain.from_iter()`` to be completely
+explicit that it is an alternate constructor rather than an in-place
+mutation, as well as how it differs from the standard constructor.
+
+Open questions
+^^^^^^^^^^^^^^
+
+* Should ``bytearray.byte()`` also be added? Or is
+  ``bytearray(bytes.byte(x))`` sufficient for that case?
+* Should ``bytes.from_len()`` also be added? Or is sequence repetition
+  sufficient for that case?
+* Should ``bytearry.from_len()`` use a different name?
+* Should ``bytes.byte()`` raise ``TypeError`` or ``ValueError`` for binary
+  sequences with more than one element? The ``TypeError`` currently proposed
+  is copied from the behaviour of ``ord()`` with strings containing more
+  than one code point, while ``ValueError`` would be more consistent with
+  the existing handling of out-of-range integer values.
+* ``bytes.byte()`` is defined above as accepting length 1 binary sequences
+  as individual bytes, but this is currently inconsistent with the main
+  ``bytes`` constructor::
+
+      >>> bytes([b"a", b"b", b"c"])
+      Traceback (most recent call last):
+        File "<stdin>", line 1, in <module>
+      TypeError: 'bytes' object cannot be interpreted as an integer
+
+  Should the ``bytes`` constructor be changed to accept iterables of length 1
+  bytes objects in addition to iterables of integers? If so, should it
+  allow a mixture of the two in a single iterable?
+
+Iteration
+---------
+
+Iteration over ``bytes`` objects and other binary sequences produces
+integers. Rather than proposing a new method that would need to be added
+not only to ``bytes``, ``bytearray`` and ``memoryview``, but potentially
+to third party types as well, this PEP proposes that iteration to produce
+length 1 ``bytes`` objects instead be handled by combining ``map`` with
+the new ``bytes.byte()`` alternate constructor proposed above::
+
+    for x in map(bytes.byte, data):
+        # x is a length 1 ``bytes`` object, rather than an integer
+        # This works with *any* container of integers in the range
+        # 0 to 255 inclusive
+
+
+Consistently accepting integer inputs to methods
+------------------------------------------------
+
+In Python 3.3, the binary search operations (``in``, ``count()``,
+``find()``, ``index()``, ``rfind()`` and ``rindex()``) were updated to
+accept integers in the range 0 to 255 (inclusive) as their first argument
+(in addition to the existing support for binary sequences).
+
+This PEP proposes extending that behaviour of accepting integers as being
+equivalent to the corresponding length 1 binary sequence to several other
+``bytes`` and ``bytearray`` methods that currently expect a ``bytes``
+object for certain parameters.
+
+* ``startswith()`` prefix(es)
+* ``endswith()`` suffix(es)
+
+* ``center()`` fill character
+* ``ljust()`` fill character
+* ``rjust()`` fill character
+
+* ``strip()`` characters to strip
+* ``lstrip()`` characters to strip
+* ``rstrip()`` characters to strip
+
+* ``partition()`` separator argument
+* ``rpartition()`` separator argument
+
+* ``split()`` separator argument
+* ``rsplit()`` separator argument
+
+* ``replace()`` old value and new value
+
+In addition to the consistency motive, this approach also makes it easier
+to work with the indexing behaviour , as the result of an indexing operation
+can more easily be fed back in to other methods.
+
+
+Acknowledgement of surprising behaviour of some ``bytearray`` methods
+---------------------------------------------------------------------
+
+Several of the ``bytes`` and ``bytearray`` methods have their origins in the
+Python 2 ``str`` API. As ``str`` is an immutable type, all of these
+operations are defined as returning a *new* instances, rather than operating
+in place. This contrasts with methods on other mutable types like ``list``,
+where ``list.sort()`` and ``list.reverse()`` operate in-place and return
+``None``, rather than creating a new object.
+
+Backwards compatibility constraints make it impractical to change this
+behaviour at this point, but it may be appropriate to explicitly call out
+this quirk in the documentation for the ``bytearray`` type. It affects the
+following methods that could reasonably be expected to operate in-place on
+a mutable type:
+
+* ``center()``
+* ``ljust()``
+* ``rjust()``
+* ``strip()``
+* ``lstrip()``
+* ``rstrip()``
+* ``replace()``
+* ``lower()``
+* ``upper()``
+* ``swapcase()``
+* ``title()``
+* ``capitalize()``
+* ``translate()``
+* ``expandtabs()``
+* ``zfill()``
+
+Note that the following ``bytearray`` operations *do* operate in place, as
+they're part of the mutable sequence API in ``bytearray``, rather than being
+inspired by the immutable Python 2 ``str`` API:
+
+* ``reverse()``
+* ``remove()``
+* ``pop()``
+
+References
+==========
+
+.. [ideas-thread1] https://mail.python.org/pipermail/python-ideas/2014-March/027295.html
+.. [empty-buffer-issue] http://bugs.python.org/issue20895
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
+
+
+..
+   Local Variables:
+   mode: indented-text
+   indent-tabs-mode: nil
+   sentence-end-double-space: t
+   fill-column: 70
+   coding: utf-8
+   End:

-- 
Repository URL: http://hg.python.org/peps