Taking into consideration the comments from the last round:
- 'bytes.zeros' renamed to 'bytes.size', with option byte filler
(defaults to b'\x00')
- 'bytes.byte' renamed to 'fromint', add 'bchr' function
- deprecation and removal softened to deprecation/discouragement
-----------
PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan
On Mon, Jul 18, 2016 at 4:17 PM, Ethan Furman
- 'bytes.zeros' renamed to 'bytes.size', with option byte filler (defaults to b'\x00')
Seriously? You went from a numpy-friendly feature to something rather numpy-hostile. In numpy, ndarray.size is an attribute that returns the number of elements in the array. The constructor that creates an arbitrary repeated value also exists and is called numpy.full(). Even ignoring numpy, bytes.size(count, value=b'\x00') is completely unintuitive. If I see bytes.size(42) in someone's code, I will think: "something like int.bit_length(), but in bytes."
*de-lurks*
On Mon, Jul 18, 2016 at 4:45 PM, Alexander Belopolsky
On Mon, Jul 18, 2016 at 4:17 PM, Ethan Furman
wrote: - 'bytes.zeros' renamed to 'bytes.size', with option byte filler (defaults to b'\x00')
Seriously? You went from a numpy-friendly feature to something rather numpy-hostile. In numpy, ndarray.size is an attribute that returns the number of elements in the array.
The constructor that creates an arbitrary repeated value also exists and is called numpy.full().
Even ignoring numpy, bytes.size(count, value=b'\x00') is completely unintuitive. If I see bytes.size(42) in someone's code, I will think: "something like int.bit_length(), but in bytes."
full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries). Perhaps bytes.fillsize? That would seem the most intuitive to me: "fill an object of this size with this byte". I'm unfamiliar with numpy, but a quick Google search suggests that this would not conflict with anything there, if that is a concern.
This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.
You have a leftover bytes.byte here.
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble
full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
On Mon, 18 Jul 2016 at 14:35 Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble
wrote: full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
Or bytes.fromsize() to stay with the trend of naming constructor methods as from*() ?
On 07/18/2016 02:45 PM, Brett Cannon wrote:
On Mon, 18 Jul 2016 at 14:35 Alexander Belopolsky wrote:
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble wrote:
full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
Or bytes.fromsize() to stay with the trend of naming constructor methods as from*() ?
bytes.fromsize() sounds good to me, thanks for brainstorming that one for me. I wasn't really happy with 'size()' either. -- ~Ethan~
On 19 July 2016 at 08:00, Ethan Furman
On 07/18/2016 02:45 PM, Brett Cannon wrote:
On Mon, 18 Jul 2016 at 14:35 Alexander Belopolsky wrote:
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble wrote:
full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
Or bytes.fromsize() to stay with the trend of naming constructor methods as from*() ?
bytes.fromsize() sounds good to me, thanks for brainstorming that one for me. I wasn't really happy with 'size()' either.
Heh, I should have finished reading the thread before replying - this and one of my other comments were already picked up :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Jul 18, 2016, at 17:34, Alexander Belopolsky wrote:
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble
wrote: full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
What's wrong with b'\0'*42?
On Mon, 18 Jul 2016 at 15:49 Random832
On Mon, Jul 18, 2016, at 17:34, Alexander Belopolsky wrote:
On Mon, Jul 18, 2016 at 5:01 PM, Jonathan Goble
wrote: full(), despite its use in numpy, is also unintuitive to me (my first thought is that it would indicate whether an object has room for more entries).
Perhaps bytes.fillsize?
I wouldn't want to see bytes.full() either. Maybe bytes.of_size()?
What's wrong with b'\0'*42?
It's mentioned in the PEP as to why.
On 07/18/2016 02:01 PM, Jonathan Goble wrote:
This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.
You have a leftover bytes.byte here.
Thanks, fixed (plus the other couple locations ;) -- ~Ethan~
(Thanks for moving this forward, Ethan!)
On 19 July 2016 at 06:17, Ethan Furman
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and ``memoryview.iterbytes`` alternative iterators
As a possible alternative to this aspect, what if we adjusted memorview.cast() to also support the "s" format code from the struct module? At the moment, trying to use "s" gives a value error:
bview = memoryview(data).cast("s") Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview: destination format must be a native single character format prefixed with an optional '@'
However, it could be supported by always interpreting it as equivalent to "1s", such that the view produced length 1 bytes objects on indexing and iteration, rather than integers (which is what it does given the default "b" format). Given "memoryview(data).cast('s')" as a basic building block, most of the other aspects of working with bytes objects as if they were Python 2 strings should become relatively straightforward, so the question would be whether we wanted to make it easy for people to avoid constructing the mediating memoryview object.
Proposals =========
Deprecation of current "zero-initialised sequence" behaviour without removal ----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size::
>>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.
I'd suggest being more explicit that this would just be a documented deprecation, rather than a programmatic deprecatation warning.
Addition of explicit "count and byte initialised sequence" constructors -----------------------------------------------------------------------
To replace the deprecated behaviour, this PEP proposes the addition of an explicit ``size`` alternative constructor as a class method on both ``bytes`` and ``bytearray`` whose first argument is the count, and whose second argument is the fill byte to use (defaults to ``\x00``)::
>>> bytes.size(3) b'\x00\x00\x00' >>> bytearray.size(3) bytearray(b'\x00\x00\x00') >>> bytes.size(5, b'\x0a') b'\x0a\x0a\x0a\x0a\x0a' >>> bytearray.size(5, b'\x0a') bytearray(b'\x0a\x0a\x0a\x0a\x0a')
While I like the notion of having "size" in the name, the "noun-as-constructor" phrasing doesn't read right to me. Perhaps "fromsize" for consistency with "fromhex"?
It will behave just as the current constructors behave when passed a single integer.
This last paragraph feels incomplete now, given the expansion to allow the fill value to be specified.
Addition of "bchr" function and explicit "single byte" constructors -------------------------------------------------------------------
As binary counterparts to the text ``chr`` function, this PEP proposes the addition of a ``bchr`` function and an explicit ``fromint`` alternative constructor as a class method on both ``bytes`` and ``bytearray``::
>>> bchr(ord("A")) b'A' >>> bchr(ord(b"A")) b'A' >>> bytes.fromint(65) b'A' >>> bytearray.fromint(65) bytearray(b'A')
Since "fromsize" would also accept an int value, "fromint" feels ambiguous here. Perhaps "fromord" to emphasise the integer is being interpreted as an ordinal bytes value, rather than as a size? The apparent "two ways to do it" here also deserves some additional explanation: - the bchr builtin is to recreate the ord/chr/unichr trio from Python 2 under a different naming scheme - the class method is mainly for the "bytearray.fromord" case, with bytes.fromord added for consistency [snip sections on accessing elements as bytes object]
Design discussion =================
Why not rely on sequence repetition to create zero-initialised sequences? -------------------------------------------------------------------------
Zero-initialised sequences can be created via sequence repetition::
>>> b'\x00' * 3 b'\x00\x00\x00' >>> bytearray(b'\x00') * 3 bytearray(b'\x00\x00\x00')
However, this was also the case when the ``bytearray`` type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable ``bytes`` type then inherited that feature when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.byte(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.
This note will need some tweaks to match the updated method names in the proposal. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Alexander Belopolsky
-
Brett Cannon
-
Ethan Furman
-
Jonathan Goble
-
Nick Coghlan
-
Random832