Can we make bytes.from_len's second argument be keyword-only? Passing in two integer literals looks ambiguous if you don't know what the second argument is for.<br><p dir="ltr"></p>

<p dir="ltr">On Saturday, March 29, 2014 10:18:04 PM, Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>> wrote:</p>

<blockquote><p dir="ltr">On 30 March 2014 07:07, Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a>> wrote:<br>

> I already have a draft PEP written that covers the constructor issue,<br>

> iteration and adding acceptance of integer inputs to the remaining<br>

> methods that don't currently handle them. There was some background<br>

> explanation of the text/binary domain split in the Python 2->3<br>

> transition that I wanted Guido's feedback on before posting, but I<br>

> just realised I can cut that out for now, and then add it back after<br>

> Guido has had a chance to review it.<br>

><br>

> So I'll tidy that up and get the draft posted later today.</p>

<p dir="ltr">Guido pointed out most of the stuff I had asked him to look at wasn't<br>

actually relevant to the PEP, so I just cut most of it entirely.<br>

Suffice to say, after stepping back and reviewing them systematically<br>

for the first time in years, I believe the APIs for the core binary<br>

data types in Python 3 could do with a little sprucing up :)</p>

<p dir="ltr">Web version: <a href="http://www.python.org/dev/peps/pep-0467/">http://</a><a href="http://www.python.org/dev/peps/pep-0467/">www.python.org</a><a href="http://www.python.org/dev/peps/pep-0467/">/dev/</a><a href="http://www.python.org/dev/peps/pep-0467/">peps</a><a href="http://www.python.org/dev/peps/pep-0467/">/pep-0467/</a></p>


<p dir="ltr">======================================<br>

PEP: 467<br>

Title: Improved API consistency for bytes and bytearray<br>

Version: $Revision$<br>

Last-Modified: $Date$<br>

Author: Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a>><br>

Status: Draft<br>

Type: Standards Track<br>

Content-Type: text/x-rst<br>

Created: 2014-03-30<br>

Python-Version: 3.5<br>

Post-History: 2014-03-30<br></p>

<p dir="ltr">Abstract<br>

========</p>

<p dir="ltr">During the initial development of the Python 3 language specification, the<br>

core ``bytes`` type for arbitrary binary data started as the mutable type<br>

that is now referred to as ``bytearray``. Other aspects of operating in<br>

the binary domain in Python have also evolved over the course of the Python<br>

3 series.</p>

<p dir="ltr">This PEP proposes a number of small adjustments to the APIs of the ``bytes``<br>

and ``bytearray`` types to make their behaviour more internally consistent<br>

and to make it easier to operate entirely in the binary domain for use cases<br>

that actually involve manipulating binary data directly, rather than<br>

converting it to a more structured form with additional modelling<br>

semantics (such as ``str``) and then converting back to binary format after<br>

processing.<br></p>

<p dir="ltr">Background<br>

==========</p>

<p dir="ltr">Over the course of Python 3's evolution, a number of adjustments have been<br>

made to the core ``bytes`` and ``bytearray`` types as additional practical<br>

experience was gained with using them in code beyond the Python 3 standard<br>

library and test suite. However, to date, these changes have been made<br>

on a relatively ad hoc tactical basis as specific issues were identified,<br>

rather than as part of a systematic review of the APIs of these types. This<br>

approach has allowed inconsistencies to creep into the API design as to which<br>

input types are accepted by different methods. Additional inconsistencies<br>

linger from an earlier pre-release design where there was *no* separate<br>

``bytearray`` type, and instead the core ``bytes`` type was mutable (with<br>

no immutable counterpart), as well as from the origins of these types in<br>

the text-like behaviour of the Python 2 ``str`` type.</p>

<p dir="ltr">This PEP aims to provide the missing systematic review, with the goal of<br>

ensuring that wherever feasible (given backwards compatibility constraints)<br>

these current inconsistencies are addressed for the Python 3.5 release.<br></p>

<p dir="ltr">Proposals<br>

=========</p>

<p dir="ltr">As a "consistency improvement" proposal, this PEP is actually about a number<br>

of smaller micro-proposals, each aimed at improving the self-consistency of<br>

the binary data model in Python 3. Proposals are motivated by one of three<br>

factors:</p>

<p dir="ltr">* removing remnants of the original design of ``bytes`` as a mutable type<br>

* more consistently accepting length 1 ``bytes`` objects as input where an<br>

Â  integer between ``0`` and ``255`` inclusive is expected, and vice-versa<br>

* allowing users to easily convert integer output to a length 1 ``bytes``<br>

Â  object<br></p>

<p dir="ltr">Alternate Constructors<br>

----------------------</p>

<p dir="ltr">The ``bytes`` and ``bytearray`` constructors currently accept an integer<br>

argument, but interpret it to mean a zero-filled object of the given length.<br>

This is a legacy of the original design of ``bytes`` as a mutable type,<br>

rather than a particularly intuitive behaviour for users. It has become<br>

especially confusing now that other ``bytes`` interfaces treat integers<br>

and the corresponding length 1 bytes instances as equivalent input.<br>

Compare::</p>

<p dir="ltr">Â  Â  >>> b"\x03" in bytes([1, 2, 3])<br>

Â  Â  True<br>

Â  Â  >>> 3 in bytes([1, 2, 3])<br>

Â  Â  True</p>

<p dir="ltr">Â  Â  >>> bytes(b"\x03")<br>

Â  Â  b'\x03'<br>

Â  Â  >>> bytes(3)<br>

Â  Â  b'\x00\x00\x00'</p>

<p dir="ltr">This PEP proposes that the current handling of integers in the bytes and<br>

bytearray constructors by deprecated in Python 3.5 and removed in Python<br>

3.6, being replaced by two more type appropriate alternate constructors<br>

provided as class methods. The initial python-ideas thread [ideas-thread1]_<br>

that spawned this PEP was specifically aimed at deprecating this constructor<br>

behaviour.</p>

<p dir="ltr">For ``bytes``, a ``byte`` constructor is proposed that converts integers<br>

(as indicated by ``operator.index``) in the appropriate range to a ``bytes``<br>

object, converts objects that support the buffer API to bytes, and also<br>

passes through length 1 byte strings unchanged::</p>

<p dir="ltr">Â  Â  >>> bytes.byte(3)<br>

Â  Â  b'\x03'<br>

Â  Â  >>> bytes.byte(bytearray(bytes([3])))<br>

Â  Â  b'\x03'<br>

Â  Â  >>> bytes.byte(memoryview(bytes([3])))<br>

Â  Â  b'\x03'<br>

Â  Â  >>> bytes.byte(bytes([3]))<br>

Â  Â  b'\x03'<br>

Â  Â  >>> bytes.byte(512)<br>

Â  Â  Traceback (most recent call last):<br>

Â  Â  Â  File "<stdin>", line 1, in <module><br>

Â  Â  ValueError: bytes must be in range(0, 256)<br>

Â  Â  >>> bytes.byte(b"ab")<br>

Â  Â  Traceback (most recent call last):<br>

Â  Â  Â  File "<stdin>", line 1, in <module><br>

Â  Â  TypeError: bytes.byte() expected a byte, but buffer of length 2 found</p>

<p dir="ltr">One specific use case for this alternate constructor is to easily convert<br>

the result of indexing operations on ``bytes`` and other binary sequences<br>

from an integer to a ``bytes`` object. The documentation for this API<br>

should note that its counterpart for the reverse conversion is ``ord()``.</p>

<p dir="ltr">For ``bytearray``, a ``from_len`` constructor is proposed that preallocates<br>

the buffer filled with a particular value (default to ``0``) as a direct<br>

replacement for the current constructor behaviour, rather than having to use<br>

sequence repetition to achieve the same effect in a less intuitive way::</p>

<p dir="ltr">Â  Â  >>> bytearray.from_len(3)<br>

Â  Â  bytearray(b'\x00\x00\x00')<br>

Â  Â  >>> bytearray.from_len(3, 6)<br>

Â  Â  bytearray(b'\x06\x06\x06'</p>

</blockquote>

<br><blockquote><p dir="ltr"><br>

This part of the proposal was covered by an existing issue<br>

[empty-buffer-issue]_ and a variety of names have been proposed<br>

(``empty_buffer``, ``zeros``, ``zeroes``, ``allnull``, ``fill``). The<br>

specific name currently proposed was chosen by analogy with<br>

``dict.fromkeys()`` and ``itertools.chain.from_iter()`` to be completely<br>

explicit that it is an alternate constructor rather than an in-place<br>

mutation, as well as how it differs from the standard constructor.<br></p>

<p dir="ltr">Open questions<br>

^^^^^^^^^^^^^^</p>

<p dir="ltr">* Should ``bytearray.byte()`` also be added? Or is<br>

Â  ``bytearray(bytes.byte(x))`` sufficient for that case?<br>

* Should ``bytes.from_len()`` also be added? Or is sequence repetition<br>

Â  sufficient for that case?<br>

* Should ``bytearray.from_len()`` use a different name?<br>

* Should ``bytes.byte()`` raise ``TypeError`` or ``ValueError`` for binary<br>

Â  sequences with more than one element? The ``TypeError`` currently proposed<br>

Â  is copied (with slightly improved wording) from the behaviour of ``ord()``<br>

Â  with sequences containing more than one code point, while ``ValueError``<br>

Â  would be more consistent with the existing handling of out-of-range<br>

Â  integer values.<br>

* ``bytes.byte()`` is defined above as accepting length 1 binary sequences<br>

Â  as individual bytes, but this is currently inconsistent with the main<br>

Â  ``bytes`` constructor::</p>

<p dir="ltr">Â  Â  Â  >>> bytes([b"a", b"b", b"c"])<br>

Â  Â  Â  Traceback (most recent call last):<br>

Â  Â  Â  Â  File "<stdin>", line 1, in <module><br>

Â  Â  Â  TypeError: 'bytes' object cannot be interpreted as an integer</p>

<p dir="ltr">Â  Should the ``bytes`` constructor be changed to accept iterables of length 1<br>

Â  bytes objects in addition to iterables of integers? If so, should it<br>

Â  allow a mixture of the two in a single iterable?<br></p>

<p dir="ltr">Iteration<br>

---------</p>

<p dir="ltr">Iteration over ``bytes`` objects and other binary sequences produces<br>

integers. Rather than proposing a new method that would need to be added<br>

not only to ``bytes``, ``bytearray`` and ``memoryview``, but potentially<br>

to third party types as well, this PEP proposes that iteration to produce<br>

length 1 ``bytes`` objects instead be handled by combining ``map`` with<br>

the new ``bytes.byte()`` alternate constructor proposed above::</p>

<p dir="ltr">Â  Â  for x in map(bytes.byte, data):<br>

Â  Â  Â  Â  # x is a length 1 ``bytes`` object, rather than an integer<br>

Â  Â  Â  Â  # This works with *any* container of integers in the range<br>

Â  Â  Â  Â  # 0 to 255 inclusive<br></p>

<p dir="ltr">Consistent support for different input types<br>

--------------------------------------------</p>

<p dir="ltr">In Python 3.3, the binary search operations (``in``, ``count()``,<br>

``find()``, ``index()``, ``rfind()`` and ``rindex()``) were updated to<br>

accept integers in the range 0 to 255 (inclusive) as their first argument<br>

(in addition to the existing support for binary sequences).</p>

<p dir="ltr">This PEP proposes extending that behaviour of accepting integers as being<br>

equivalent to the corresponding length 1 binary sequence to several other<br>

``bytes`` and ``bytearray`` methods that currently expect a ``bytes``<br>

object for certain parameters. In essence, if a value is an acceptable<br>

input to the new ``bytes.byte`` constructor defined above, then it would<br>

be acceptable in the roles defined here (in addition to any other already<br>

supported inputs):</p>

<p dir="ltr">* ``startswith()`` prefix(es)<br>

* ``endswith()`` suffix(es)</p>

<p dir="ltr">* ``center()`` fill character<br>

* ``ljust()`` fill character<br>

* ``rjust()`` fill character</p>

<p dir="ltr">* ``strip()`` character to strip<br>

* ``lstrip()`` character to strip<br>

* ``rstrip()`` character to strip</p>

<p dir="ltr">* ``partition()`` separator argument<br>

* ``rpartition()`` separator argument</p>

<p dir="ltr">* ``split()`` separator argument<br>

* ``rsplit()`` separator argument</p>

<p dir="ltr">* ``replace()`` old value and new value</p>

<p dir="ltr">In addition to the consistency motive, this approach also makes it easier<br>

to work with the indexing behaviour , as the result of an indexing operation<br>

can more easily be fed back in to other methods.</p>

<p dir="ltr">For ``bytearray``, some additional changes are proposed to the current<br>

integer based operations to ensure they remain consistent with the proposed<br>

constructor changes::</p>

<p dir="ltr">* ``append()``: updated to be consistent with ``bytes.byte()``<br>

* ``remove()``: updated to be consistent with ``bytes.byte()``<br>

* ``+=``: updated to be consistent with ``bytes()`` changes (if any)<br>

* ``extend()``: updated to be consistent with ``bytes()`` changes (if any)<br></p>

<p dir="ltr">Acknowledgement of surprising behaviour of some ``bytearray`` methods<br>

---------------------------------------------------------------------</p>

<p dir="ltr">Several of the ``bytes`` and ``bytearray`` methods have their origins in the<br>

Python 2 ``str`` API. As ``str`` is an immutable type, all of these<br>

operations are defined as returning a *new* instance, rather than operating<br>

in place. This contrasts with methods on other mutable types like ``list``,<br>

where ``list.sort()`` and ``list.reverse()`` operate in-place and return<br>

``None``, rather than creating a new object.</p>

<p dir="ltr">Backwards compatibility constraints make it impractical to change this<br>

behaviour at this point, but it may be appropriate to explicitly call out<br>

this quirk in the documentation for the ``bytearray`` type. It affects the<br>

following methods that could reasonably be expected to operate in-place on<br>

a mutable type:</p>

<p dir="ltr">* ``center()``<br>

* ``ljust()``<br>

* ``rjust()``<br>

* ``strip()``<br>

* ``lstrip()``<br>

* ``rstrip()``<br>

* ``replace()``<br>

* ``lower()``<br>

* ``upper()``<br>

* ``swapcase()``<br>

* ``title()``<br>

* ``capitalize()``<br>

* ``translate()``<br>

* ``expandtabs()``<br>

* ``zfill()``</p>

<p dir="ltr">Note that the following ``bytearray`` operations *do* operate in place, as<br>

they're part of the mutable sequence API in ``bytearray``, rather than being<br>

inspired by the immutable Python 2 ``str`` API:</p>

<p dir="ltr">* ``+=``<br>

* ``append()``<br>

* ``extend()``<br>

* ``reverse()``<br>

* ``remove()``<br>

* ``pop()``<br></p>

<p dir="ltr">References<br>

==========</p>

<p dir="ltr">.. [ideas-thread1]<br>

<a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">https</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">://</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">mail.python.org</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">/</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">pipermail</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">/python-ideas/2014-March/027295.</a><a href="https://mail.python.org/pipermail/python-ideas/2014-March/027295.html">html</a><br>


.. [empty-buffer-issue] <a href="http://bugs.python.org/issue20895">http://</a><a href="http://bugs.python.org/issue20895">bugs.python.org</a><a href="http://bugs.python.org/issue20895">/issue20895</a><br></p>

<p dir="ltr">Copyright<br>

=========</p>

<p dir="ltr">This document has been placed in the public domain.</p>

<p dir="ltr">--<br>

Nick Coghlan Â  | Â  <a href="mailto:ncoghlan@gmail.com">ncoghlan</a><a href="mailto:ncoghlan@gmail.com">@</a><a href="mailto:ncoghlan@gmail.com">gmail.com</a> Â  | Â  Brisbane, Australia<br>

_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org">Python-ideas@</a><a href="mailto:Python-ideas@python.org">python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas">https</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">://</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">mail.python.org</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">/mailman/</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">listinfo</a><a href="https://mail.python.org/mailman/listinfo/python-ideas">/python-ideas</a><br>


Code of Conduct: <a href="http://python.org/psf/codeofconduct/">http://</a><a href="http://python.org/psf/codeofconduct/">python.org</a><a href="http://python.org/psf/codeofconduct/">/</a><a href="http://python.org/psf/codeofconduct/">psf</a><a href="http://python.org/psf/codeofconduct/">/</a><a href="http://python.org/psf/codeofconduct/">codeofconduct</a><a href="http://python.org/psf/codeofconduct/">/</a><br>


</p>

</blockquote>

<p dir="ltr"><br>

</p>