One more iteration. PEPs repo not updated yet. Changes are renaming of methods to be ``fromsize()`` and ``fromord()``, and moving ``memoryview`` to an Open Questions section. PEP: 467 Title: Minor API improvements for binary sequences Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman <ethan@stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.6 Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01 Abstract ======== During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series. This PEP proposes five small adjustments to the APIs of the ``bytes`` and ``bytearray`` types to make it easier to operate entirely in the binary domain: * Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators Proposals ========= Deprecation of current "zero-initialised sequence" behaviour without removal ---------------------------------------------------------------------------- Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:: >>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00') This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely. No other changes are proposed to the existing constructors. Addition of explicit "count and byte initialised sequence" constructors ----------------------------------------------------------------------- To replace the deprecated behaviour, this PEP proposes the addition of an explicit ``fromsize`` alternative constructor as a class method on both ``bytes`` and ``bytearray`` whose first argument is the count, and whose second argument is the fill byte to use (defaults to ``\x00``):: >>> bytes.fromsize(3) b'\x00\x00\x00' >>> bytearray.fromsize(3) bytearray(b'\x00\x00\x00') >>> bytes.fromsize(5, b'\x0a') b'\x0a\x0a\x0a\x0a\x0a' >>> bytearray.fromsize(5, b'\x0a') bytearray(b'\x0a\x0a\x0a\x0a\x0a') ``fromsize`` will behave just as the current constructors behave when passed a single integer, while allowing for non-zero fill values when needed. Addition of "bchr" function and explicit "single byte" constructors ------------------------------------------------------------------- As binary counterparts to the text ``chr`` function, this PEP proposes the addition of a ``bchr`` function and an explicit ``fromord`` alternative constructor as a class method on both ``bytes`` and ``bytearray``:: >>> bchr(ord("A")) b'A' >>> bchr(ord(b"A")) b'A' >>> bytes.fromord(65) b'A' >>> bytearray.fromord(65) bytearray(b'A') These methods will only accept integers in the range 0 to 255 (inclusive):: >>> bytes.fromord(512) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: integer must be in range(0, 256) >>> bytes.fromord(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'float' object cannot be interpreted as an integer While this does create some duplication, there are valid reasons for it:: * the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python 2 under a different naming scheme * the class method is mainly for the ``bytearray.fromord`` case, with ``bytes.fromord`` added for consistency The documentation of the ``ord`` builtin will be updated to explicitly note that ``bchr`` is the primary inverse operation for binary data, while ``chr`` is the inverse operation for text data, and that ``bytes.fromord`` and ``bytearray.fromord`` also exist. Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is expected to be easier to discover and easier to read (especially when used in conjunction with indexing operations on binary sequence types). As a separate method, the new spelling will also work better with higher order functions like ``map``. Addition of "getbyte" method to retrieve a single byte ------------------------------------------------------ This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte`` which will always return ``bytes``:: >>> b'abc'.getbyte(0) b'a' If an index is asked for that doesn't exist, ``IndexError`` is raised:: >>> b'abc'.getbyte(9) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index out of range Addition of optimised iterator methods that produce ``bytes`` objects --------------------------------------------------------------------- This PEP proposes that ``bytes`` and ``bytearray``gain an optimised ``iterbytes`` method that produces length 1 ``bytes`` objects rather than integers:: for x in data.iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer For example:: >>> tuple(b"ABC".iterbytes()) (b'A', b'B', b'C') Design discussion ================= Why not rely on sequence repetition to create zero-initialised sequences? ------------------------------------------------------------------------- Zero-initialised sequences can be created via sequence repetition:: >>> b'\x00' * 3 b'\x00\x00\x00' >>> bytearray(b'\x00') * 3 bytearray(b'\x00\x00\x00') However, this was also the case when the ``bytearray`` type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable ``bytes`` type then inherited that feature when it was introduced in PEP 3137. This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.fromint(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity. Open Questions ============== Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later? References ========== .. [1] Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [2] Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html) .. [3] Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895) .. [4] Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644) .. [5] August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [6] June 2016 discussion thread on python-dev (https://mail.python.org/pipermail/python-dev/2016-June/144875.html) Copyright ========= This document has been placed in the public domain.
2016-09-01 21:36 GMT+02:00 Ethan Furman <ethan@stoneleaf.us>:
Abstract ========
This PEP proposes five small adjustments to the APIs of the ``bytes`` and ``bytearray`` types to make it easier to operate entirely in the binary domain:
You should add bchr() in the Abstract.
* Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
I understand that main reason for this change is to catch bugs when bytes(obj) is used and obj is not supposed to be an integer. So I expect that bytes(int) will be quickly deprecated, but the PEP doesn't schedule a removal of the feature. So it looks more than only adding an alias to bytes(int). I would prefer to either schedule a removal of bytes(int), or remove bytes.fromsize() from the PEP.
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
Hum, you already propose to add a builtin function. Why would we need two ways to create a single byte? I'm talking about bchr(int)==bytes.fromord(int). I'm not sure that there is an use case for bytearray.fromord(int).
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
I like these ones :-)
In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.fromint(x)`` proposal in this PEP.
"fromint"? Is it bytes.fromord()/bchr()?
Open Questions ==============
Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later?
It's nice to have bytes.iterbytes() to help porting Python 2 code, but I'm not sure that this function would be super popular in new Python 3 code. I don't think that a memoryview.iterbytes() (or cast("s")) would be useful. Victor
On 09/01/2016 02:06 PM, Victor Stinner wrote:
2016-09-01 21:36 GMT+02:00 Ethan Furman:
Abstract ========
This PEP proposes five small adjustments to the APIs of the ``bytes`` and ``bytearray`` types to make it easier to operate entirely in the binary domain:
You should add bchr() in the Abstract.
Done.
* Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
I understand that main reason for this change is to catch bugs when bytes(obj) is used and obj is not supposed to be an integer.
So I expect that bytes(int) will be quickly deprecated, but the PEP doesn't schedule a removal of the feature. So it looks more than only adding an alias to bytes(int).
I would prefer to either schedule a removal of bytes(int), or remove bytes.fromsize() from the PEP.
The PEP states that ``bytes(x)`` will not be removed while 2.7 is supported. Once 2.7 is no longer a concern we can visit the question of removing that behavior.
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
Hum, you already propose to add a builtin function. Why would we need two ways to create a single byte?
- `bchr` to mirror `chr` - `fromord` to replace the mistaken purpose of the default constructor
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
I like these ones :-)
Cool.
In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.fromint(x)`` proposal in this PEP.
"fromint"? Is it bytes.fromord()/bchr()?
Oops, fixed. -- ~Ethan
2016-09-02 0:04 GMT+02:00 Ethan Furman <ethan@stoneleaf.us>:
- `fromord` to replace the mistaken purpose of the default constructor
To replace a bogus bytes(obj)? If someone writes bytes(obj) but expect to create a byte string from an integer, why not using bchr() to fix the code? Victor
On 09/01/2016 04:07 PM, Victor Stinner wrote:
2016-09-02 0:04 GMT+02:00 Ethan Furman:
- `fromord` to replace the mistaken purpose of the default constructor
To replace a bogus bytes(obj)? If someone writes bytes(obj) but expect to create a byte string from an integer, why not using bchr() to fix the code?
The problem with only having `bchr` is that it doesn't help with `bytearray`; the problem with not having `bchr` is who wants to write `bytes.fromord`? So we need `bchr`, and we need `bytearray.fromord`; and since the major difference between `bytes` and `bytearray` is that one is mutable and one is not, `bytes` should also have `fromord`. -- ~Ethan~
Yes, this was my point: I don't think that we need a bytearray method to create a mutable string from a single byte. Victor Le samedi 3 septembre 2016, Random832 <random832@fastmail.com> a écrit :
On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`;
What is the use case for bytearray.fromord? Even in the rare case someone needs it, why not bytearray(bchr(...))? _______________________________________________ Python-Dev mailing list Python-Dev@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ victor.stinner%40gmail.com
Le samedi 3 septembre 2016, Random832 <random832@fastmail.com> a écrit :
On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`;
What is the use case for bytearray.fromord? Even in the rare case someone needs it, why not bytearray(bchr(...))?
On 3 September 2016 at 08:47, Victor Stinner <victor.stinner@gmail.com> wrote:
Yes, this was my point: I don't think that we need a bytearray method to create a mutable string from a single byte.
I agree with the above. Having an easy way to turn an int into a bytes object is good. But I think the built-in bchr() function on its own is enough. Just like we have bytes object literals, but the closest we have for a bytearray literal is bytearray(b". . .").
On 3 September 2016 at 21:35, Martin Panter <vadmium+py@gmail.com> wrote:
Le samedi 3 septembre 2016, Random832 <random832@fastmail.com> a écrit :
On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`;
What is the use case for bytearray.fromord? Even in the rare case someone needs it, why not bytearray(bchr(...))?
On 3 September 2016 at 08:47, Victor Stinner <victor.stinner@gmail.com> wrote:
Yes, this was my point: I don't think that we need a bytearray method to create a mutable string from a single byte.
I agree with the above. Having an easy way to turn an int into a bytes object is good. But I think the built-in bchr() function on its own is enough. Just like we have bytes object literals, but the closest we have for a bytearray literal is bytearray(b". . .").
This is a good point - earlier versions of the PEP didn't include bchr(), they just had the class methods, so "bytearray(bchr(...))" wasn't an available spelling (if I remember the original API design correctly, it would have been something like "bytearray(bytes.byte(...))"), which meant there was a strong consistency argument in having the alternate constructor on both types. Now that the PEP proposes the "bchr" builtin, the "fromord" constructors look less necessary. Given that, and the uncertain deprecation time frame for accepting integers in the main bytes and bytearray constructors, perhaps both the "fromsize" and "fromord" parts of the proposal can be deferred indefinitely in favour of just adding the bchr() builtin? We wouldn't gain the "initialise a region of memory to an arbitrary value" feature, but it can be argued that wanting that is a sign someone may be better off with a more specialised memory manipulation library, rather than relying solely on the builtins. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 09/03/2016 09:48 AM, Nick Coghlan wrote:
On 3 September 2016 at 21:35, Martin Panter wrote:
On 3 September 2016 at 08:47, Victor Stinner wrote:
Le samedi 3 septembre 2016, Random832 a écrit :
On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`;
What is the use case for bytearray.fromord? Even in the rare case someone needs it, why not bytearray(bchr(...))?
Yes, this was my point: I don't think that we need a bytearray method to create a mutable string from a single byte.
I agree with the above. Having an easy way to turn an int into a bytes object is good. But I think the built-in bchr() function on its own is enough. Just like we have bytes object literals, but the closest we have for a bytearray literal is bytearray(b". . .").
This is a good point - earlier versions of the PEP didn't include bchr(), they just had the class methods, so "bytearray(bchr(...))" wasn't an available spelling (if I remember the original API design correctly, it would have been something like "bytearray(bytes.byte(...))"), which meant there was a strong consistency argument in having the alternate constructor on both types. Now that the PEP proposes the "bchr" builtin, the "fromord" constructors look less necessary.
tl;dr -- Sounds good to me. I'll update the PEP. ------- When this started the idea behind the methods that eventually came to be called "fromord" and "fromsize" was that they would be the two possible interpretations of "bytes(x)": the legacy Python2 behavior: >>> var = bytes('abc') >>> bytes(var[1]) 'b' the current Python 3 behavior: >>> var = b'abc' >>> bytes(var[1]) b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \x00\x00' Digging deeper the problem turns out to be that indexing a bytes object changed: Python 2: >>> b'abc'[1] 'b' Python 3: >>> b'abc'[1] 98 If we pass an actual byte into the Python 3 bytes constructor it behaves as one would expect: >>> bytes(b'b') b'b' Given all this it can be argued that the real problem is that indexing a bytes object behaves differently depending on whether you retrieve a single byte with an index versus a single byte with a slice: >>> b'abc'[2] 99 >>> b'abc'[2:] b'c' Since we cannot fix that behavior, the question is how do we make it more livable? - we can add a built-in to transform the int back into a byte:
bchr(b'abc'[2]) b'c'
- we can add a method to return a byte from the bytes object, not an int:
b'abc'.getbyte(2) b'c'
- we can add a method to return a byte from an int:
bytes.fromint(b'abc'[2]) b'c'
Which is all to say we have two problems to deal with: - getting bytes from a bytes object - getting bytes from an int Since "bytes.fromint()" and "bchr()" are the same, and given that "bchr(ordinal)" mirrors "chr(ordinal)", I think "bchr" is the better choice for getting bytes from an int. For getting bytes from bytes, "getbyte()" and "iterbytes" are good choices.
Given that, and the uncertain deprecation time frame for accepting integers in the main bytes and bytearray constructors, perhaps both the "fromsize" and "fromord" parts of the proposal can be deferred indefinitely in favour of just adding the bchr() builtin?
Agreed. -- ~Ethan~
Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`; the problem with not having `bchr` is who wants to write `bytes.fromord`?
If we called it 'bytes.fnord' (From Numeric Ordinal) people would want to write it just for the fun factor. -- Greg
On 09/02/2016 06:17 PM, Greg Ewing wrote:
Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with `bytearray`; the problem with not having `bchr` is who wants to write `bytes.fromord`?
If we called it 'bytes.fnord' (From Numeric Ordinal) people would want to write it just for the fun factor.
Very good point! :) -- ~Ethan~
Some quick comments below, a few more later: On Thu, Sep 1, 2016 at 10:36 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
One more iteration. PEPs repo not updated yet. Changes are renaming of methods to be ``fromsize()`` and ``fromord()``, and moving ``memoryview`` to an Open Questions section.
PEP: 467 Title: Minor API improvements for binary sequences Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan <ncoghlan@gmail.com>, Ethan Furman < ethan@stoneleaf.us> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.6 Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
Abstract ========
During the initial development of the Python 3 language specification, the core ``bytes`` type for arbitrary binary data started as the mutable type that is now referred to as ``bytearray``. Other aspects of operating in the binary domain in Python have also evolved over the course of the Python 3 series.
This PEP proposes five small adjustments to the APIs of the ``bytes`` and ``bytearray`` types to make it easier to operate entirely in the binary domain:
* Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
I wonder if from_something with an underscore is more consistent (according to a quick search perhaps yes). What about bytes.getchar and iterchars? A 'byte' in python 3 seems to be an integer. (I would still like a .chars property that gives a sequence view with __getitem__ and __len__ so that the getchar and iterchars methods are not needed) chrb seems to be more in line with some bytes versions in for instance os than bchr. Do we really need chrb? Better to introduce from_int or from_ord also in str and recommend that over chr? -- Koos (mobile)
Proposals =========
Deprecation of current "zero-initialised sequence" behaviour without
removal
----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size::
bytes(3) b'\x00\x00\x00' bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.
No other changes are proposed to the existing constructors.
Addition of explicit "count and byte initialised sequence" constructors -----------------------------------------------------------------------
To replace the deprecated behaviour, this PEP proposes the addition of an explicit ``fromsize`` alternative constructor as a class method on both ``bytes`` and ``bytearray`` whose first argument is the count, and whose second argument is the fill byte to use (defaults to ``\x00``)::
bytes.fromsize(3) b'\x00\x00\x00' bytearray.fromsize(3) bytearray(b'\x00\x00\x00') bytes.fromsize(5, b'\x0a') b'\x0a\x0a\x0a\x0a\x0a' bytearray.fromsize(5, b'\x0a') bytearray(b'\x0a\x0a\x0a\x0a\x0a')
``fromsize`` will behave just as the current constructors behave when
passed
a single integer, while allowing for non-zero fill values when needed.
Addition of "bchr" function and explicit "single byte" constructors -------------------------------------------------------------------
As binary counterparts to the text ``chr`` function, this PEP proposes the addition of a ``bchr`` function and an explicit ``fromord`` alternative constructor as a class method on both ``bytes`` and ``bytearray``::
bchr(ord("A")) b'A' bchr(ord(b"A")) b'A' bytes.fromord(65) b'A' bytearray.fromord(65) bytearray(b'A')
These methods will only accept integers in the range 0 to 255 (inclusive)::
bytes.fromord(512) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: integer must be in range(0, 256)
bytes.fromord(1.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'float' object cannot be interpreted as an integer
While this does create some duplication, there are valid reasons for it::
* the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python 2 under a different naming scheme * the class method is mainly for the ``bytearray.fromord`` case, with ``bytes.fromord`` added for consistency
The documentation of the ``ord`` builtin will be updated to explicitly note that ``bchr`` is the primary inverse operation for binary data, while ``chr`` is the inverse operation for text data, and that ``bytes.fromord`` and ``bytearray.fromord`` also exist.
Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current ``bytes([x])`` (and similarly for ``bytearray``). The new spelling is expected to be easier to discover and easier to read (especially when used in conjunction with indexing operations on binary sequence types).
As a separate method, the new spelling will also work better with higher order functions like ``map``.
Addition of "getbyte" method to retrieve a single byte ------------------------------------------------------
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte`` which will always return ``bytes``::
b'abc'.getbyte(0) b'a'
If an index is asked for that doesn't exist, ``IndexError`` is raised::
b'abc'.getbyte(9) Traceback (most recent call last): File "<stdin>", line 1, in <module> IndexError: index out of range
Addition of optimised iterator methods that produce ``bytes`` objects ---------------------------------------------------------------------
This PEP proposes that ``bytes`` and ``bytearray``gain an optimised ``iterbytes`` method that produces length 1 ``bytes`` objects rather than integers::
for x in data.iterbytes(): # x is a length 1 ``bytes`` object, rather than an integer
For example::
tuple(b"ABC".iterbytes()) (b'A', b'B', b'C')
Design discussion =================
Why not rely on sequence repetition to create zero-initialised sequences? -------------------------------------------------------------------------
Zero-initialised sequences can be created via sequence repetition::
b'\x00' * 3 b'\x00\x00\x00' bytearray(b'\x00') * 3 bytearray(b'\x00\x00\x00')
However, this was also the case when the ``bytearray`` type was originally designed, and the decision was made to add explicit support for it in the type constructor. The immutable ``bytes`` type then inherited that feature when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing the spelling as users sometimes find the current behaviour of the binary sequence constructors surprising. In particular, there's a reasonable case to be made that ``bytes(x)`` (where ``x`` is an integer) should behave like the ``bytes.fromint(x)`` proposal in this PEP. Providing both behaviours as separate class methods avoids that ambiguity.
Open Questions ==============
Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later?
References ==========
.. [1] Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [2] Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html) .. [3] Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895) .. [4] Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644) .. [5] August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [6] June 2016 discussion thread on python-dev (https://mail.python.org/pipermail/python-dev/2016-June/144875.html)
Copyright =========
This document has been placed in the public domain.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On 2 September 2016 at 17:54, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Thu, Sep 1, 2016 at 10:36 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
* Deprecate passing single integer values to ``bytes`` and ``bytearray`` * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
I wonder if from_something with an underscore is more consistent (according to a quick search perhaps yes).
That would not be too inconsistent with the sister constructor bytes.fromhex().
On 3 September 2016 at 03:54, Koos Zevenhoven <k7hoven@gmail.com> wrote:
chrb seems to be more in line with some bytes versions in for instance os than bchr.
The mnemonic for the current name in the PEP is that bchr is to chr as b"" is to "". The PEP should probably say that in addition to pointing out the 'unichr' Python 2 inspiration, though. The other big difference between this and the os module case, is that the resulting builtin constructor pairs here are str/chr (arbitrary text, single code point) and bytes/bchr (arbitrary binary data, single binary octet). By contrast, os.getcwd() and os.getcwdb() (and similar APIs) are both referring to the same operating system level operation, they're just requesting a different return type for the data. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, Sep 3, 2016 at 7:59 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 3 September 2016 at 03:54, Koos Zevenhoven <k7hoven@gmail.com> wrote:
chrb seems to be more in line with some bytes versions in for instance os than bchr.
The mnemonic for the current name in the PEP is that bchr is to chr as b"" is to "". The PEP should probably say that in addition to pointing out the 'unichr' Python 2 inspiration, though.
Thanks for explaining. Indeed I hope that unichr does not affect any naming decisions that will remain in the language for a long time.
The other big difference between this and the os module case, is that the resulting builtin constructor pairs here are str/chr (arbitrary text, single code point) and bytes/bchr (arbitrary binary data, single binary octet). By contrast, os.getcwd() and os.getcwdb() (and similar APIs) are both referring to the same operating system level operation, they're just requesting a different return type for the data.
But chr and "bchr" are also requesting a different return type. The difference is that the data is not coming from an os-level operation but from an int. I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be. I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char. -- Koos
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be.
I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char.
What about byte? Like, not bytes.byte, just builtins.byte.
On 4 September 2016 at 00:11, Random832 <random832@fastmail.com> wrote:
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be.
I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char.
What about byte? Like, not bytes.byte, just builtins.byte.
I like this option, it would be very "symmetric" to have, compare:
chr(42) '*' str() ''
with this:
byte(42) b'*' bytes() b''
It is easy to explain and remember this. -- Ivan
On Sun, Sep 4, 2016 at 1:23 AM, Ivan Levkivskyi <levkivskyi@gmail.com> wrote:
On 4 September 2016 at 00:11, Random832 <random832@fastmail.com> wrote:
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be.
I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char.
What about byte? Like, not bytes.byte, just builtins.byte.
I like this option, it would be very "symmetric" to have, compare:
chr(42) '*' str() ''
with this:
byte(42) b'*' bytes() b''
It is easy to explain and remember this.
In one way, I like it, but on the other hand, indexing a bytes gives an integer, so maybe a 'byte' is just an integer in range(256). Also, having both byte and bytes would be a slight annoyance with autocomplete. -- Koos
On 4 September 2016 at 08:11, Random832 <random832@fastmail.com> wrote:
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be.
I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char.
The PEP started out with a classmethod, and that proved problematic due to length and the expectation of API symmetry with bytearray. A new builtin paralleling chr avoids both of those problems.
What about byte? Like, not bytes.byte, just builtins.byte.
The main problem with "byte" as a name is that "bytes" is *not* an iterable of these - it's an iterable of ints. That concern doesn't arise with chr/str as they're both abbreviated singular nouns rather than one being the plural form of the other (it also doesn't hurt that str actually is an iterable of chr results). If we wanted a meaningful noun (other than byte) for the bchr concept, then the alternative term that most readily comes to mind for me is "octet", but I don't know how intuitive or memorable that would be for folks without an embedded systems or serial communications background (especially given that we already have 'oct', which does something completely different). That said, the PEP does propose "getbyte()" and "iterbytes()" for bytes-oriented indexing and iteration, so there's a reasonable consistency argument in favour of also proposing "byte" as the builtin factory function: * data.getbyte(idx) would be a more efficient alternative to byte(data[idx]) * data.iterbytes() would be a more efficient alternative to map(byte, data) With bchr, those mappings aren't as clear (plus there's a potentially unwanted "text" connotation arising from the use of the "chr" abbreviation). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sun, Sep 4, 2016 at 12:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 4 September 2016 at 08:11, Random832 <random832@fastmail.com> wrote:
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
I guess one reason I don't like bchr (nor chrb, really) is that they look just like a random sequence of letters in builtins, but not recognizable the way asdf would be.
I guess I have one last pair of suggestions for the name of this function: bytes.chr or bytes.char.
The PEP started out with a classmethod, and that proved problematic due to length and the expectation of API symmetry with bytearray. A new builtin paralleling chr avoids both of those problems.
What about byte? Like, not bytes.byte, just builtins.byte.
The main problem with "byte" as a name is that "bytes" is *not* an iterable of these - it's an iterable of ints. That concern doesn't arise with chr/str as they're both abbreviated singular nouns rather than one being the plural form of the other (it also doesn't hurt that str actually is an iterable of chr results).
Since you agree with me about this... [...]
That said, the PEP does propose "getbyte()" and "iterbytes()" for bytes-oriented indexing and iteration, so there's a reasonable consistency argument in favour of also proposing "byte" as the builtin factory function:
* data.getbyte(idx) would be a more efficient alternative to byte(data[idx]) * data.iterbytes() would be a more efficient alternative to map(byte, data)
.. I don't understand the argument for having 'byte' in these names. They should have 'char' or 'chr' in them for exacly the same reason that the proposed builtin should have 'chr' in it instead of 'byte'. If 'bytes' is an iterable of ints, then get_byte should probably return an int I'm sorry, but this argument comes across as "were're proposing the wrong thing here, so for consistency, we might want to do the wrong thing in this other part too". And didn't someone recently propose deprecating iterability of str (not indexing, or slicing, just iterability)? Then str would also need a way to provide an iterable or sequence view of the characters. For consistency, the str functionality would probably need to mimic the approach in bytes. IOW, this PEP may in fact ultimately dictate how to get a iterable/sequence from a str object. -- Koos
With bchr, those mappings aren't as clear (plus there's a potentially unwanted "text" connotation arising from the use of the "chr" abbreviation).
Which mappings?
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On 4 September 2016 at 20:43, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sun, Sep 4, 2016 at 12:51 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That said, the PEP does propose "getbyte()" and "iterbytes()" for bytes-oriented indexing and iteration, so there's a reasonable consistency argument in favour of also proposing "byte" as the builtin factory function:
* data.getbyte(idx) would be a more efficient alternative to byte(data[idx]) * data.iterbytes() would be a more efficient alternative to map(byte, data)
.. I don't understand the argument for having 'byte' in these names. They should have 'char' or 'chr' in them for exacly the same reason that the proposed builtin should have 'chr' in it instead of 'byte'. If 'bytes' is an iterable of ints, then get_byte should probably return an int
I'm sorry, but this argument comes across as "were're proposing the wrong thing here, so for consistency, we might want to do the wrong thing in this other part too".
There are two self-consistent sets of names: bchr bytes.getbchr, bytearray.getbchr bytes.iterbchr, bytearray.iterbchr byte bytes.getbyte, bytearray.getbyte bytes.iterbytes, bytearray.iterbytes The former set emphasises the "stringiness" of this behaviour, by aligning with the chr() builtin The latter set emphasises that these APIs are still about working with arbitrary binary data rather than text, with a Python "byte" subsequently being a length 1 bytes object containing a single integer between 0 and 255, rather than "What you get when you index or iterate over a bytes instance". Having noticed the discrepancy, my personal preference is to go with the latter option (since it better fits the "executable pseudocode" ideal and despite my reservations about "bytes objects contain int objects rather than byte objects", that shouldn't be any more confusing in the long run than explaining that str instances are containers of length-1 str instances). The fact "byte" is much easier to pronounce than bchr (bee-cher? bee-char?) also doesn't hurt. However, I suspect we'll need to put both sets of names in front of Guido and ask him to just pick whichever he prefers to get it resolved one way or the other.
And didn't someone recently propose deprecating iterability of str (not indexing, or slicing, just iterability)? Then str would also need a way to provide an iterable or sequence view of the characters. For consistency, the str functionality would probably need to mimic the approach in bytes. IOW, this PEP may in fact ultimately dictate how to get a iterable/sequence from a str object.
Strings are not going to become atomic objects, no matter how many times people suggest it.
With bchr, those mappings aren't as clear (plus there's a potentially unwanted "text" connotation arising from the use of the "chr" abbreviation).
Which mappings?
The mapping between the builtin name and the method names. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
There are two self-consistent sets of names:
Let me add a few. I wonder if this is really used so much that bytes.chr is too long to type (and you can do bchr = bytes.chr if you want to): bytes.chr (or bchr in builtins) bytes.chr_at, bytearray.chr_at bytes.iterchr, bytearray.iterchr bytes.chr (or bchr in builtins) bytes.chrview, bytearray.chrview (sequence views) bytes.char (or bytes.chr or bchr in builtins) bytes.chars, bytearray.chars (sequence views)
bchr bytes.getbchr, bytearray.getbchr bytes.iterbchr, bytearray.iterbchr
byte bytes.getbyte, bytearray.getbyte bytes.iterbytes, bytearray.iterbytes
The former set emphasises the "stringiness" of this behaviour, by aligning with the chr() builtin
The latter set emphasises that these APIs are still about working with arbitrary binary data rather than text, with a Python "byte" subsequently being a length 1 bytes object containing a single integer between 0 and 255, rather than "What you get when you index or iterate over a bytes instance".
Having noticed the discrepancy, my personal preference is to go with the latter option (since it better fits the "executable pseudocode" ideal and despite my reservations about "bytes objects contain int objects rather than byte objects", that shouldn't be any more confusing in the long run than explaining that str instances are containers of length-1 str instances). The fact "byte" is much easier to pronounce than bchr (bee-cher? bee-char?) also doesn't hurt.
However, I suspect we'll need to put both sets of names in front of Guido and ask him to just pick whichever he prefers to get it resolved one way or the other.
And didn't someone recently propose deprecating iterability of str (not indexing, or slicing, just iterability)? Then str would also need a way to provide an iterable or sequence view of the characters. For consistency, the str functionality would probably need to mimic the approach in bytes. IOW, this PEP may in fact ultimately dictate how to get a iterable/sequence from a str object.
Strings are not going to become atomic objects, no matter how many times people suggest it.
You consider all non-iterable objects atomic? If str.__iter__ raises an exception, it does not turn str somehow atomic. I wouldn't be surprised by breaking changes of this nature to python at some point. The breakage will be quite significant, but easy to fix. -- Koos
On Sun, Sep 4, 2016, at 16:42, Koos Zevenhoven wrote:
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
There are two self-consistent sets of names:
Let me add a few. I wonder if this is really used so much that bytes.chr is too long to type (and you can do bchr = bytes.chr if you want to):
bytes.chr (or bchr in builtins) bytes.chr_at, bytearray.chr_at
Ugh, that "at" is too reminiscent of java. And it just feels wrong to spell it "chr" rather than "char" when there's a vowel elsewhere in the name. Hmm... how offensive to the zen of python would it be to have "magic" to allow both bytes.chr(65) and b'ABCDE'.chr[0]? (and possibly also iter(b'ABCDE'.chr)? That is, a descriptor which is callable on the class, but returns a view on instances?
On Mon, Sep 5, 2016 at 3:30 AM, Random832 <random832@fastmail.com> wrote:
On Sun, Sep 4, 2016, at 16:42, Koos Zevenhoven wrote:
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
There are two self-consistent sets of names:
Let me add a few. I wonder if this is really used so much that bytes.chr is too long to type (and you can do bchr = bytes.chr if you want to):
bytes.chr (or bchr in builtins) bytes.chr_at, bytearray.chr_at
Ugh, that "at" is too reminiscent of java. And it just feels wrong to spell it "chr" rather than "char" when there's a vowel elsewhere in the name.
Oh, I didn't realize that connection. It's funny that I get a Java connotation from get* methods ;).
Hmm... how offensive to the zen of python would it be to have "magic" to allow both bytes.chr(65) and b'ABCDE'.chr[0]? (and possibly also iter(b'ABCDE'.chr)? That is, a descriptor which is callable on the class, but returns a view on instances?
Indeed quite magical, while I really like how easy it is to remember this *once you realize what is going on*. I think bytes.char (on class) and data.chars (on instance) would be quite similar. -- Koos
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On 5 September 2016 at 06:42, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
There are two self-consistent sets of names:
Let me add a few. I wonder if this is really used so much that bytes.chr is too long to type (and you can do bchr = bytes.chr if you want to)
bytes.chr (or bchr in builtins)
The main problem with class method based spellings is that we need to either duplicate it on bytearray or else break the bytearray/bytes symmetry and propose "bytearray(bytes.chr(x))" as the replacement for current cryptic "bytearray([x])" Consider: bytearray([x]) bytearray(bchr(x)) bytearray(byte(x)) bytearray(bytes.chr(x)) Folks that care about maintainability are generally willing to trade a few extra characters at development time for ease of reading later, but there are limits to how large a trade-off they can be asked to make if we expect the alternative to actually be used (since overly verbose code can be a readability problem in its own right).
bytes.chr_at, bytearray.chr_at bytes.iterchr, bytearray.iterchr
These don't work for me because I'd expect iterchr to take encoding and errors arguments and produce length 1 strings. You also run into a searchability problem as "chr" will get hits for both the chr builtin and bytes.chr, similar to the afalg problem that recently came up in another thread. While namespaces are a honking great idea, the fact that search is non-hierarchical means they still don't give API designers complete freedom to reuse names at will.
bytes.chr (or bchr in builtins) bytes.chrview, bytearray.chrview (sequence views)
bytes.char (or bytes.chr or bchr in builtins) bytes.chars, bytearray.chars (sequence views)
The views are already available via memoryview.cast if folks really want them, but encouraging their use in general isn't a great idea, as it means more function developers now need to ask themselves "What if someone passes me a char view rather than a normal bytes object?".
Strings are not going to become atomic objects, no matter how many times people suggest it.
You consider all non-iterable objects atomic? If str.__iter__ raises an exception, it does not turn str somehow atomic.
"atomic" is an overloaded word in software design, but it's still the right one for pointing out that something people want strings to be atomic, and sometimes they don't - it depends on what they're doing. In particular, you can look up the many, many, many discussions of providing a generic flatten operation for iterables, and how it always founders on the question of types like str and bytes, which can both be usefully viewed as an atomic unit of information, *and* as containers of smaller units of information (NumPy arrays are another excellent example of this problem).
I wouldn't be surprised by breaking changes of this nature to python at some point.
I would, and you should be to: http://www.curiousefficiency.org/posts/2014/08/python-4000.html
The breakage will be quite significant, but easy to fix.
Please keep in mind that we're already 10 years into a breaking change to Python's text handling model, with another decade or so still to go before the legacy Python 2 text model is spoken of largely in terms similar to the way COBOL is spoken of today. There is no such thing as a "significant, but easy to fix" change when it comes to adjusting how a programming language handles text data, as text handling is a fundamental part of defining how a language is used to communicate with people. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Mon, Sep 5, 2016 at 6:06 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 5 September 2016 at 06:42, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
There are two self-consistent sets of names:
Let me add a few. I wonder if this is really used so much that bytes.chr is too long to type (and you can do bchr = bytes.chr if you want to)
bytes.chr (or bchr in builtins)
The main problem with class method based spellings is that we need to either duplicate it on bytearray or else break the bytearray/bytes symmetry and propose "bytearray(bytes.chr(x))" as the replacement for current cryptic "bytearray([x])"
Warning: some API-design philosophy below: 1. It's not as bad to break symmetry regarding what functionality is offered for related object types (here: str, bytes, bytearray) than it is to break symmetry in how the symmetric functionality is provided. IOW, a missing unnecessary functionality is less bad than exposing the equivalent functionality under a different name. (This might be kind of how Random832 was reasoning previously) 2. Symmetry is more important in object access functionality than it is in instance creation. IOW, symmetry regarding 'constructors' (here: bchr, bytes.chr, bytes.byte, ...) across different types is not as crucial as symmetry in slicing. The reason is that the caller of a constructor is likely to know which class it is instantiating. A consumer of bytes/bytearray/str-like objects often does not know which type is being dealt with. I might be crying over spilled milk here, but that seems to be the point of the whole PEP. That chars view thing might collect some of the milk back back into a bottle: mystr[whatever] <-> mybytes.chars[whatever] <-> mybytearray.chars[whatever] iter(mystr) <-> iter(mybytes.chars) <-> iter(mybytearray.chars) Then introduce 'chars' on str and this becomes mystring.chars[whatever] <-> mybytes.chars[whatever] <-> mybytearray.chars[whatever] iter(mystr.chars) <-> iter(mybytes.chars) <-> iter(mybytearray.chars) If iter(mystr.chars) is recommended and iter(mystr) discouraged, then after a decade or two, the world may look quite different regarding how important it is for a str to be iterable. This would solve multiple problems at once. Well I admit that "at once" is not really an accurate description of the process :). [...]
You also run into a searchability problem as "chr" will get hits for both the chr builtin and bytes.chr, similar to the afalg problem that recently came up in another thread. While namespaces are a honking great idea, the fact that search is non-hierarchical means they still don't give API designers complete freedom to reuse names at will.
Oh, I can kind of see a point here, especially if the search hits aren't related in any way. Why not just forget all symmetry if this is an issue? But is it really a bad thing if by searching you find that there's a chr for both str and bytes? If I think, "I want to turn my int into a bytes 'character' kind of in the way that chr turns my int into a str". What am I going to search or google for? I can't speak for others, but I would probably search for something that contains 'chr' and 'bytes'. Based on this, I'm unable to see the search disadvantage of bytes.chr. [...]
bytes.char (or bytes.chr or bchr in builtins) bytes.chars, bytearray.chars (sequence views)
The views are already available via memoryview.cast if folks really want them, but encouraging their use in general isn't a great idea, as it means more function developers now need to ask themselves "What if someone passes me a char view rather than a normal bytes object?".
Thanks, I think this is the first real argument I hear against the char view. In fact, I don't think people should ask themselves that question, and just not accept bytes views as input. Would it be enough to discourage storing and passing bytes views? Anyway, the only error that would pass silently would be that the passed-in object gets indexed (e.g. obj[0]) and a bytes-char comes out instead of an int. But it would be a strange thing to do by the caller to pass a char view into the bytes-consumer. I could imagine someone wanting to pass a bytes view into a str-consumer. But there are no significant silently-passing errors there. If str also gets .chars, then it becomes even easier to support this. -- Koos
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On 1 September 2016 at 19:36, Ethan Furman <ethan@stoneleaf.us> wrote:
Deprecation of current "zero-initialised sequence" behaviour without removal ----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size::
>>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.
Can you clarify what “deprecate” means? Just add a note in the documentation, or make calls trigger a DeprecationWarning as well? Having bytearray(n) trigger a DeprecationWarning would be a minor annoyance for code being compatible with Python 2 and 3, since bytearray(n) is supported in Python 2.
Addition of "getbyte" method to retrieve a single byte ------------------------------------------------------
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte`` which will always return ``bytes``::
Should getbyte() handle negative indexes? E.g. getbyte(-1) returning the last byte.
Open Questions ==============
Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later?
Apparently memoryview.cast('s') comes from Nick Coghlan: <https://marc.info/?i=CADiSq7e=8ieyeW-tXf5diMS_5NuAOS5udv-3g_w3LTWN9WboJw@mai...>. However, since 3.5 (https://bugs.python.org/issue15944) you can call cast("c") on most memoryviews, which I think already does what you want:
tuple(memoryview(b"ABC").cast("c")) (b'A', b'B', b'C')
On 09/03/2016 05:08 AM, Martin Panter wrote:
On 1 September 2016 at 19:36, Ethan Furman wrote:
Deprecation of current "zero-initialised sequence" behaviour without removal ----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size::
>>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.
Can you clarify what “deprecate” means? Just add a note in the documentation, [...]
This one.
Addition of "getbyte" method to retrieve a single byte ------------------------------------------------------
This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte`` which will always return ``bytes``::
Should getbyte() handle negative indexes? E.g. getbyte(-1) returning the last byte.
Yes.
Open Questions ==============
Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later?
Apparently memoryview.cast('s') comes from Nick Coghlan: <https://marc.info/?i=CADiSq7e=8ieyeW-tXf5diMS_5NuAOS5udv-3g_w3LTWN9WboJw@mai...>. However, since 3.5 (https://bugs.python.org/issue15944) you can call cast("c") on most memoryviews, which I think already does what you want:
tuple(memoryview(b"ABC").cast("c")) (b'A', b'B', b'C')
Nice! -- ~Ethan~
On Sat, Sep 3, 2016 at 6:41 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
Open Questions ==============
Do we add ``iterbytes`` to ``memoryview``, or modify ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation? Or do we ignore memory for now and add it later?
Apparently memoryview.cast('s') comes from Nick Coghlan:
<https://marc.info/?i=CADiSq7e=8ieyeW-tXf5diMS_5NuAOS5udv-3g_w3LTWN9WboJw@mai...>. However, since 3.5 (https://bugs.python.org/issue15944) you can call cast("c") on most memoryviews, which I think already does what you want:
tuple(memoryview(b"ABC").cast("c"))
(b'A', b'B', b'C')
Nice!
Indeed! Exposing this as bytes_instance.chars would make porting from Python 2 really simple. Of course even better would be if slicing the view would return bytes, so the porting rule would be the same for all bytes subscripting: py2str[SOMETHING] becomes py3bytes.chars[SOMETHING] With the "c" memoryview there will be a distinction between slicing and indexing. And Random832 seems to be making some good points. --- Koos
-- ~Ethan~
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
On Sat, Sep 3, 2016, at 08:08, Martin Panter wrote:
On 1 September 2016 at 19:36, Ethan Furman <ethan@stoneleaf.us> wrote:
Deprecation of current "zero-initialised sequence" behaviour without removal ----------------------------------------------------------------------------
Currently, the ``bytes`` and ``bytearray`` constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size::
>>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.6, but to leave it in place for at least as long as Python 2.7 is supported, possibly indefinitely.
Can you clarify what “deprecate” means? Just add a note in the documentation, or make calls trigger a DeprecationWarning as well? Having bytearray(n) trigger a DeprecationWarning would be a minor annoyance for code being compatible with Python 2 and 3, since bytearray(n) is supported in Python 2.
I don't think bytearray(n) should be deprecated. I don't think that deprecating bytes(n) should entail also deprecating bytes(n). If I were designing these classes from scratch, I would not feel any impulse to make their constructors take the same arguments or have the same semantics, and I'm a bit unclear on what the reason for this decision was. I also don't think bytes.fromcount(n) is necessary. What's wrong with b'\0'*n? I could swear this has been answered before, but I don't recall what the answer was. I don't think the rationale mentioned in the PEP is an adequate explanation, it references an earlier decision, about a conceptually different class (it's an operation that's much more common with mutable classes than immutable ones - when's the last time you did (None,)*n relative to [None]*n), without actually explaining the real reason for either underlying decision (having bytearray(n) and having both classes take the same constructor arguments). I think that the functions we should add/keep are: bytes(values: Union[bytes, bytearray, Iterable[int]) bytearray(count : int) bytearray(values: Union[bytes, bytearray, Iterable[int]) bchr(integer) If, incidentally, we're going to add a .fromsize method, it'd be nice to add a way to provide a fill value other than 0. Also, maybe we should also add it for list and tuple (with the default value None)? For the (string, encoding) signatures, there's no good reason to keep them [TOOWTDI is str.encode] but no good reason to get rid of them either.
On 01.09.16 22:36, Ethan Furman wrote:
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators
Could you please add a mention of alternative: seqtools.chunks()? seqtools.chunks(bytes, 1) and seqtools.chunks(bytearray, 1) should be equivalent to bytes.iterbytes() and bytearray.iterbytes() (but this function is applicable to arbitrary sequences, including memoryview and array). Is there a need of a PEP for new seqtools module (currently two classes are planned), or just providing sample implementation on the bugtracker would be enough?
participants (9)
-
Ethan Furman
-
Greg Ewing
-
Ivan Levkivskyi
-
Koos Zevenhoven
-
Martin Panter
-
Nick Coghlan
-
Random832
-
Serhiy Storchaka
-
Victor Stinner