[Python-ideas] Fixing the Python 3 bytes constructor

Sun Mar 30 09:03:32 CEST 2014

On Sat, Mar 29, 2014 at 11:31 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On 30 March 2014 16:10, Gregory P. Smith <greg at krypto.org> wrote:
> >> Open questions
> >> ^^^^^^^^^^^^^^
> >>
> >> * Should ``bytearray.byte()`` also be added? Or is
> >>   ``bytearray(bytes.byte(x))`` sufficient for that case?
> >> * Should ``bytes.from_len()`` also be added? Or is sequence repetition
> >>   sufficient for that case?
> >
> > I prefer keeping them consistent across the types myself.
> >
> >> * Should ``bytearray.from_len()`` use a different name?
> >
> > This name works for me.
> >
> >>
> >> * Should ``bytes.byte()`` raise ``TypeError`` or ``ValueError`` for
> binary
> >>   sequences with more than one element? The ``TypeError`` currently
> >> proposed
> >>   is copied (with slightly improved wording) from the behaviour of
> >> ``ord()``
> >>   with sequences containing more than one code point, while
> ``ValueError``
> >>   would be more consistent with the existing handling of out-of-range
> >>   integer values.
> >> * ``bytes.byte()`` is defined above as accepting length 1 binary
> sequences
> >>   as individual bytes, but this is currently inconsistent with the main
> >>   ``bytes`` constructor::
> >
> >
> > I don't like that bytes.byte() would accept anything other than an int.
> It
> > should not accept length 1 binary sequences at all.  I'd prefer to see
> > bytes.byte(b"X") raise a TypeError.
>
> Unfortunately, it's not that simple, because accepting both is the
> only way I see of rendering the current APIs coherent. The problem is
> that the str-derived APIs expect bytes objects, the bytearray mutating
> methods expect integers, and in Python 3.3, the substring search APIs
> were updated to accept both. This means we currently have:
>
> >>> data = bytes([1, 2, 3, 4])
> >>> 3 in data
> True
> >>> b"\x03" in data
> True
> >>> data.count(3)
> 1
> >>> data.count(b"\x03")
> 1
> >>> data.replace(3, 4)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: expected bytes, bytearray or buffer compatible object
> >>> data.replace(b"\x03", b"\x04")
> b'\x01\x02\x04\x04'
> >>> mutable = bytearray(data)
> >>> mutable
> bytearray(b'\x01\x02\x03\x04')
> >>> mutable.append(b"\x05")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: an integer is required
> >>> mutable.append(5)
> >>> mutable
> bytearray(b'\x01\x02\x03\x04\x05')
>
> Since some APIs work one way, some work the other, the only backwards
> compatible path I see to consistency is to always treat a length 1
> byte string as an acceptable input for the APIs that currently accept
> an integer and vice-versa.
>
> That said, I think this hybrid nature accurately reflects the fact
> that indexing and slicing bytes objects in Python 3 return different
> types - the individual elements are integers, but the subsequences are
> bytes objects, and several of these APIs are either
> "element-or-subsequence" APIs (in which case they should accept both),
> or else they *should* have been element APIs, but currently expect a
> subsequence due to their Python 2 str heritage.
>
> If we had the opportunity to redesign these APIs from scratch, we'd
> likely make a much clearer distinction between element based APIs
> (that would use integers) and subsequence APIs (that would accept
> buffer implementing objects). As it is, I think the situation is
> inherently ambiguous, and providing hybrid APIs to help deal with that
> ambiguity is our best available option.
>

Okay I see where you're going with this.  So long as we limit this to APIs
specifically surrounding bytes and bytearray I guess I'm "fine" with it
(given the status quo and existing mess of never knowing which behaviors
will be allowed where).

Other APIs that accept numbers outside of bytes() and bytearray() related
methods should *never* accept a bytes() of any length as valid numeric
input.

Thanks for exploring all of the APIs, we do have quite a mess that can be
made better.

> >> For ``bytearray``, some additional changes are proposed to the current
> >> integer based operations to ensure they remain consistent with the
> >> proposed
> >> constructor changes::
> >>
> >> * ``append()``: updated to be consistent with ``bytes.byte()``
> >> * ``remove()``: updated to be consistent with ``bytes.byte()``
> >> * ``+=``: updated to be consistent with ``bytes()`` changes (if any)
> >
> >
> > Where was a change to += behavior mentioned? I don't see that above (or
> did
> > I miss something?).
>
> It was an open question against the constructors - if bytes.byte() is
> defined as the PEP suggests, then the case can be made that the
> iterables accepted by the bytes() constructor should also be made more
> permissive in terms of the contents of the iterables it accepts. If
> *that* happens, then extending an existing bytearray should also
> become more permissive.
>
> Note that I'm not sold on actually changing that - that's why it's an
> open question, rather than something the PEP is currently proposing.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140330/a373f86a/attachment.html>