Can we make bytes.from_len's second argument be keyword-only? Passing in two integer literals looks ambiguous if you don't know what the second argument is for.<br><p dir="ltr"></p>
<p dir="ltr">On Saturday, March 29, 2014 10:18:04 PM, Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>> wrote:</p>
<blockquote><p dir="ltr">On 30 March 2014 07:07, Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a>> wrote:<br>
> I already have a draft PEP written that covers the constructor issue,<br>
> iteration and adding acceptance of integer inputs to the remaining<br>
> methods that don't currently handle them. There was some background<br>
> explanation of the text/binary domain split in the Python 2->3<br>
> transition that I wanted Guido's feedback on before posting, but I<br>
> just realised I can cut that out for now, and then add it back after<br>
> Guido has had a chance to review it.<br>
><br>
> So I'll tidy that up and get the draft posted later today.</p>
<p dir="ltr">Guido pointed out most of the stuff I had asked him to look at wasn't<br>
actually relevant to the PEP, so I just cut most of it entirely.<br>
Suffice to say, after stepping back and reviewing them systematically<br>
for the first time in years, I believe the APIs for the core binary<br>
data types in Python 3 could do with a little sprucing up :)</p>
<p dir="ltr">Web version: <a href="http://www.python.org/dev/peps/pep-0467/">http://</a><a href="http://www.python.org/dev/peps/pep-0467/">www.python.org</a><a href="http://www.python.org/dev/peps/pep-0467/">/dev/</a><a href="http://www.python.org/dev/peps/pep-0467/">peps</a><a href="http://www.python.org/dev/peps/pep-0467/">/pep-0467/</a></p>
<p dir="ltr">======================================<br>
PEP: 467<br>
Title: Improved API consistency for bytes and bytearray<br>
Version: $Revision$<br>
Last-Modified: $Date$<br>
Author: Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a>><br>
Status: Draft<br>
Type: Standards Track<br>
Content-Type: text/x-rst<br>
Created: 2014-03-30<br>
Python-Version: 3.5<br>
Post-History: 2014-03-30<br></p>
<p dir="ltr">Abstract<br>
========</p>
<p dir="ltr">During the initial development of the Python 3 language specification, the<br>
core ``bytes`` type for arbitrary binary data started as the mutable type<br>
that is now referred to as ``bytearray``. Other aspects of operating in<br>
the binary domain in Python have also evolved over the course of the Python<br>
3 series.</p>
<p dir="ltr">This PEP proposes a number of small adjustments to the APIs of the ``bytes``<br>
and ``bytearray`` types to make their behaviour more internally consistent<br>
and to make it easier to operate entirely in the binary domain for use cases<br>
that actually involve manipulating binary data directly, rather than<br>
converting it to a more structured form with additional modelling<br>
semantics (such as ``str``) and then converting back to binary format after<br>
processing.<br></p>
<p dir="ltr">Background<br>
==========</p>
<p dir="ltr">Over the course of Python 3's evolution, a number of adjustments have been<br>
made to the core ``bytes`` and ``bytearray`` types as additional practical<br>
experience was gained with using them in code beyond the Python 3 standard<br>
library and test suite. However, to date, these changes have been made<br>
on a relatively ad hoc tactical basis as specific issues were identified,<br>
rather than as part of a systematic review of the APIs of these types. This<br>
approach has allowed inconsistencies to creep into the API design as to which<br>
input types are accepted by different methods. Additional inconsistencies<br>
linger from an earlier pre-release design where there was *no* separate<br>
``bytearray`` type, and instead the core ``bytes`` type was mutable (with<br>
no immutable counterpart), as well as from the origins of these types in<br>
the text-like behaviour of the Python 2 ``str`` type.</p>
<p dir="ltr">This PEP aims to provide the missing systematic review, with the goal of<br>
ensuring that wherever feasible (given backwards compatibility constraints)<br>
these current inconsistencies are addressed for the Python 3.5 release.<br></p>
<p dir="ltr">Proposals<br>
=========</p>
<p dir="ltr">As a "consistency improvement" proposal, this PEP is actually about a number<br>
of smaller micro-proposals, each aimed at improving the self-consistency of<br>
the binary data model in Python 3. Proposals are motivated by one of three<br>
factors:</p>
<p dir="ltr">* removing remnants of the original design of ``bytes`` as a mutable type<br>
* more consistently accepting length 1 ``bytes`` objects as input where an<br>
 integer between ``0`` and ``255`` inclusive is expected, and vice-versa<br>
* allowing users to easily convert integer output to a length 1 ``bytes``<br>
 object<br></p>
<p dir="ltr">Alternate Constructors<br>
----------------------</p>
<p dir="ltr">The ``bytes`` and ``bytearray`` constructors currently accept an integer<br>
argument, but interpret it to mean a zero-filled object of the given length.<br>
This is a legacy of the original design of ``bytes`` as a mutable type,<br>
rather than a particularly intuitive behaviour for users. It has become<br>
especially confusing now that other ``bytes`` interfaces treat integers<br>
and the corresponding length 1 bytes instances as equivalent input.<br>
Compare::</p>
<p dir="ltr">Â Â >>> b"\x03" in bytes([1, 2, 3])<br>
  True<br>
  >>> 3 in bytes([1, 2, 3])<br>
  True</p>
<p dir="ltr">Â Â >>> bytes(b"\x03")<br>
  b'\x03'<br>
  >>> bytes(3)<br>
  b'\x00\x00\x00'</p>
<p dir="ltr">This PEP proposes that the current handling of integers in the bytes and<br>
bytearray constructors by deprecated in Python 3.5 and removed in Python<br>
3.6, being replaced by two more type appropriate alternate constructors<br>
provided as class methods. The initial python-ideas thread [ideas-thread1]_<br>
that spawned this PEP was specifically aimed at deprecating this constructor<br>
behaviour.</p>
<p dir="ltr">For ``bytes``, a ``byte`` constructor is proposed that converts integers<br>
(as indicated by ``operator.index``) in the appropriate range to a ``bytes``<br>
object, converts objects that support the buffer API to bytes, and also<br>
passes through length 1 byte strings unchanged::</p>
<p dir="ltr">Â Â >>> bytes.byte(3)<br>
  b'\x03'<br>
  >>> bytes.byte(bytearray(bytes([3])))<br>
  b'\x03'<br>
  >>> bytes.byte(memoryview(bytes([3])))<br>
  b'\x03'<br>
  >>> bytes.byte(bytes([3]))<br>
  b'\x03'<br>
  >>> bytes.byte(512)<br>
  Traceback (most recent call last):<br>
   File "<stdin>", line 1, in <module><br>
  ValueError: bytes must be in range(0, 256)<br>
  >>> bytes.byte(b"ab")<br>
  Traceback (most recent call last):<br>
   File "<stdin>", line 1, in <module><br>
  TypeError: bytes.byte() expected a byte, but buffer of length 2 found</p>
<p dir="ltr">One specific use case for this alternate constructor is to easily convert<br>
the result of indexing operations on ``bytes`` and other binary sequences<br>
from an integer to a ``bytes`` object. The documentation for this API<br>
should note that its counterpart for the reverse conversion is ``ord()``.</p>
<p dir="ltr">For ``bytearray``, a ``from_len`` constructor is proposed that preallocates<br>
the buffer filled with a particular value (default to ``0``) as a direct<br>
replacement for the current constructor behaviour, rather than having to use<br>
sequence repetition to achieve the same effect in a less intuitive way::</p>
<p dir="ltr">Â Â >>> bytearray.from_len(3)<br>
  bytearray(b'\x00\x00\x00')<br>
  >>> bytearray.from_len(3, 6)<br>
  bytearray(b'\x06\x06\x06'</p>
</blockquote>
<br><blockquote><p dir="ltr"><br>
This part of the proposal was covered by an existing issue<br>
[empty-buffer-issue]_ and a variety of names have been proposed<br>
(``empty_buffer``, ``zeros``, ``zeroes``, ``allnull``, ``fill``). The<br>
specific name currently proposed was chosen by analogy with<br>
``dict.fromkeys()`` and ``itertools.chain.from_iter()`` to be completely<br>
explicit that it is an alternate constructor rather than an in-place<br>
mutation, as well as how it differs from the standard constructor.<br></p>
<p dir="ltr">Open questions<br>
^^^^^^^^^^^^^^</p>
<p dir="ltr">* Should ``bytearray.byte()`` also be added? Or is<br>
 ``bytearray(bytes.byte(x))`` sufficient for that case?<br>
* Should ``bytes.from_len()`` also be added? Or is sequence repetition<br>
 sufficient for that case?<br>
* Should ``bytearray.from_len()`` use a different name?<br>
* Should ``bytes.byte()`` raise ``TypeError`` or ``ValueError`` for binary<br>
 sequences with more than one element? The ``TypeError`` currently proposed<br>
 is copied (with slightly improved wording) from the behaviour of ``ord()``<br>
 with sequences containing more than one code point, while ``ValueError``<br>
 would be more consistent with the existing handling of out-of-range<br>
 integer values.<br>
* ``bytes.byte()`` is defined above as accepting length 1 binary sequences<br>
 as individual bytes, but this is currently inconsistent with the main<br>
 ``bytes`` constructor::</p>
<p dir="ltr">Â Â Â >>> bytes([b"a", b"b", b"c"])<br>
   Traceback (most recent call last):<br>
    File "<stdin>", line 1, in <module><br>
   TypeError: 'bytes' object cannot be interpreted as an integer</p>
<p dir="ltr">Â Should the ``bytes`` constructor be changed to accept iterables of length 1<br>
 bytes objects in addition to iterables of integers? If so, should it<br>
 allow a mixture of the two in a single iterable?<br></p>
<p dir="ltr">Iteration<br>
---------</p>
<p dir="ltr">Iteration over ``bytes`` objects and other binary sequences produces<br>
integers. Rather than proposing a new method that would need to be added<br>
not only to ``bytes``, ``bytearray`` and ``memoryview``, but potentially<br>
to third party types as well, this PEP proposes that iteration to produce<br>
length 1 ``bytes`` objects instead be handled by combining ``map`` with<br>
the new ``bytes.byte()`` alternate constructor proposed above::</p>
<p dir="ltr">Â Â for x in map(bytes.byte, data):<br>
    # x is a length 1 ``bytes`` object, rather than an integer<br>
    # This works with *any* container of integers in the range<br>
    # 0 to 255 inclusive<br></p>
<p dir="ltr">Consistent support for different input types<br>
--------------------------------------------</p>
<p dir="ltr">In Python 3.3, the binary search operations (``in``, ``count()``,<br>
``find()``, ``index()``, ``rfind()`` and ``rindex()``) were updated to<br>
accept integers in the range 0 to 255 (inclusive) as their first argument<br>
(in addition to the existing support for binary sequences).</p>
<p dir="ltr">This PEP proposes extending that behaviour of accepting integers as being<br>
equivalent to the corresponding length 1 binary sequence to several other<br>
``bytes`` and ``bytearray`` methods that currently expect a ``bytes``<br>
object for certain parameters. In essence, if a value is an acceptable<br>
input to the new ``bytes.byte`` constructor defined above, then it would<br>
be acceptable in the roles defined here (in addition to any other already<br>
supported inputs):</p>
<p dir="ltr">* ``startswith()`` prefix(es)<br>
* ``endswith()`` suffix(es)</p>
<p dir="ltr">* ``center()`` fill character<br>
* ``ljust()`` fill character<br>
* ``rjust()`` fill character</p>
<p dir="ltr">* ``strip()`` character to strip<br>
* ``lstrip()`` character to strip<br>
* ``rstrip()`` character to strip</p>
<p dir="ltr">* ``partition()`` separator argument<br>
* ``rpartition()`` separator argument</p>
<p dir="ltr">* ``split()`` separator argument<br>
* ``rsplit()`` separator argument</p>
<p dir="ltr">* ``replace()`` old value and new value</p>
<p dir="ltr">In addition to the consistency motive, this approach also makes it easier<br>
to work with the indexing behaviour , as the result of an indexing operation<br>
can more easily be fed back in to other methods.</p>
<p dir="ltr">For ``bytearray``, some additional changes are proposed to the current<br>
integer based operations to ensure they remain consistent with the proposed<br>
constructor changes::</p>
<p dir="ltr">* ``append()``: updated to be consistent with ``bytes.byte()``<br>
* ``remove()``: updated to be consistent with ``bytes.byte()``<br>
* ``+=``: updated to be consistent with ``bytes()`` changes (if any)<br>
* ``extend()``: updated to be consistent with ``bytes()`` changes (if any)<br></p>
<p dir="ltr">Acknowledgement of surprising behaviour of some ``bytearray`` methods<br>
---------------------------------------------------------------------</p>
<p dir="ltr">Several of the ``bytes`` and ``bytearray`` methods have their origins in the<br>
Python 2 ``str`` API. As ``str`` is an immutable type, all of these<br>
operations are defined as returning a *new* instance, rather than operating<br>
in place. This contrasts with methods on other mutable types like ``list``,<br>
where ``list.sort()`` and ``list.reverse()`` operate in-place and return<br>
``None``, rather than creating a new object.</p>
<p dir="ltr">Backwards compatibility constraints make it impractical to change this<br>
behaviour at this point, but it may be appropriate to explicitly call out<br>
this quirk in the documentation for the ``bytearray`` type. It affects the<br>
following methods that could reasonably be expected to operate in-place on<br>
a mutable type:</p>
<p dir="ltr">* ``center()``<br>
* ``ljust()``<br>
* ``rjust()``<br>
* ``strip()``<br>
* ``lstrip()``<br>
* ``rstrip()``<br>
* ``replace()``<br>
* ``lower()``<br>
* ``upper()``<br>
* ``swapcase()``<br>
* ``title()``<br>
* ``capitalize()``<br>
* ``translate()``<br>
* ``expandtabs()``<br>
* ``zfill()``</p>
<p dir="ltr">Note that the following ``bytearray`` operations *do* operate in place, as<br>
they're part of the mutable sequence API in ``bytearray``, rather than being<br>
inspired by the immutable Python 2 ``str`` API:</p>
<p dir="ltr">* ``+=``<br>
* ``append()``<br>
* ``extend()``<br>
* ``reverse()``<br>
* ``remove()``<br>
* ``pop()``<br></p>
<p dir="ltr">References<br>
==========</p>
<p dir="ltr">.. [ideas-thread1]<br>
<a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">https</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">://</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">mail.python.org</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">/</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">pipermail</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">/python-ideas/2014-March/027295.</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">html</a><br>
.. [empty-buffer-issue] <a href="http://bugs.python.org/issue20895">http://</a><a href="http://bugs.python.org/issue20895">bugs.python.org</a><a href="http://bugs.python.org/issue20895">/issue20895</a><br></p>
<p dir="ltr">Copyright<br>
=========</p>
<p dir="ltr">This document has been placed in the public domain.</p>
<p dir="ltr">--<br>
Nick Coghlan  |  <a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a>  |  Brisbane, Australia<br>
_______________________________________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@</a><a href="mailto:Python-ideas@python.org">python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas">https</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">://</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">mail.python.org</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">/mailman/</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">listinfo</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/">http://</a><a href="http://python.org/psf/codeofconduct/">python.org</a><a href="http://python.org/psf/codeofconduct/">/</a><a href="http://python.org/psf/codeofconduct/">psf</a><a href="http://python.org/psf/codeofconduct/">/</a><a href="http://python.org/psf/codeofconduct/">codeofconduct</a><a href="http://python.org/psf/codeofconduct/">/</a><br>
</p>
</blockquote>
<p dir="ltr"><br>
</p>