PEP 467 (Minor API improvements for binary sequences) - any thoughts?

This is about some minor changes to the bytes, bytearray, and memoryview classes. Here is the PEP: https://www.python.org/dev/peps/pep-0467/ The page in the bug tracker can be seen at https://bugs.python.org/issue27923 and the pull request can be seen at https://github.com/python/cpython/pull/3237. I am waiting for this to be merged, or approved, or whatever is the next step. Someone on the bug tracker mentioned restarting the discussion on the mailing list, so that is what I'm trying to do here. Does anyone have any thoughts?

On 02/21/2018 11:55 AM, Elias Zamaria wrote:
At this point the PEP itself has not been approved, and is undergoing changes. I don't see anything happening with it right now while 3.7 is going through it's final stages to release. Once 3.7.0 is published we can come back to this. -- ~Ethan~

On Wed, Feb 21, 2018 at 12:21 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
well, it was originally targeted for 3.5, so it did need kick to get going. Probably too late for 3.7.0, but no reason not to get it moving if there is support, and aren't objections. Anyone know if it paused to controversy or just lack of momentum? From a quick search, it looks like discussion simply petered out in Sept 2016. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 22 February 2018 at 08:35, Guido van Rossum <guido@python.org> wrote:
It's too late for 3.7 period, but there's no reason it can't be considered for 3.8.
Something else the PEP needs is a new champion - my original interest was to help lower barriers to Python 3 migration, but it's now more about the general ergonmics of the bytes type, and I don't do enough low level protocol work these days to have a strong opinion on that. That new champion could be Elias, or else perhaps Ethan Furman (who drove the last round of proposed updates to the PEP, which unfortunately don't appear to have been submitted to the PEPs repo: https://mail.python.org/pipermail/python-dev/2016-September/146043.html) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Feb 21, 2018 at 11:55 AM, Elias Zamaria <mikez302@gmail.com> wrote:
This is about some minor changes to the bytes, bytearray, and memoryview classes. Here is the PEP: https://www.python.org/dev/peps/pep-0467/
+1 all around. One other thought: : Addition of optimised iterator methods that produce bytes objects Maybe it would make sense to have a "byte" type that holds a single byte. It would be an integer that could only hold values from 0-255. Then the regular iterator could simply return a bunch of single byte objects. I can't say I've thought it through, but if a byte is a int with restricted range, then it could act like an int in (almost?) every context, so there would be no need for a separate iterator. I also haven't thought through whether there is any real advantage to having such a type -- but off the top of my head, making a distinction between a bytes object that happens to be length-one and a single byte could be handy. I sure do often wish for a character object. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I think the chances of a "byte" object are about as good as the chances of a character object (though one can always implement such in C extensions, that wouldn't build them into the syntax). The fact that characters are single-byte strings is responsible for certain anomalies with (e.g.) the __contains__ operator (list elements aren't lists, but string element are strings), but overall the choices made lead to sensible, comprehensible code. regards Steve Steve Holden On Wed, Feb 21, 2018 at 8:26 PM, Chris Barker <chris.barker@noaa.gov> wrote:

On Wed, Feb 21, 2018 at 12:39 PM, Steve Holden <steve@holdenweb.com> wrote:
I think the chances of a "byte" object are about as good as the chances of a character object
probably right.
(though one can always implement such in C extensions, that wouldn't build them into the syntax).
I think you could simply subclass, too (overriding __new__ and a couple methods). But that would do exactly no good, unless you used your own custom string and bytes objects, too. The whole point is that iterating over a string (Or bytes) always returns an also-iterable object, ad-infinitum. This is the cause of the major remaining common "type error" in Python. (the old integer division used to be the big one)
I'm pretty convinced that the choice not to have a character type has had basically zero benefits to sensible, comprehensible code, though it's not a very big deal, either. not a big enough deal for the churn it would cause to introduce it now, that's for sure. so +1 for this PEP as is. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick, I'm trying to reply to your message, but I can't figure out how. You mentioned that the PEP needs a "champion". What would that involve? How much time and effort would it take? What kinds of decisions would I make? The iterbytes thing in the PEP is something I was wishing for, while working on a personal project. I stumbled upon this PEP and decided to try to implement it myself, to learn about C and the Python internals, among other reasons. I don't know how I would feel working on something so general, of use to so many people for lots of different purposes. Do I know enough about all of the use cases and what everyone wants? I am not completely against it but I'd need to think about it. On Wed, Feb 21, 2018 at 2:36 PM, Chris Barker <chris.barker@noaa.gov> wrote:

On 02/26/2018 11:34 PM, Elias Zamaria wrote:
Being a PEP "champion" involves collecting lots of data, sorting it, making decisions about API design, posting about those decisions along with the pros and cons, listening to more feedback, and continuing until there is general agreement about the PEP. After that a request is made to (usually) Guido to accept or reject the PEP. If accepted, then the code writing stage happens. Writing code first is not bad as a working proof-of-concept is always handy.
The iterbytes thing in the PEP is something I was wishing for, while working on a personal project. I stumbled upon this PEP and decided to try to implement it myself, to learn about C and the Python internals, among other reasons.
It's a good way to go about it!
Part of the PEP writing process is asking for and collecting use-cases; if possible, looking at other code projects for use-cases is also useful. Time needed can vary widely depending on the subject; if I recall correctly, PEP 409 only took a few days, while PEP 435 took several weeks. PEP 467 has already gone through a few iterations, so hopefully not too much more time is required. If you would like to try you'll get plenty of help from the community -- at least from those willing to go through PEP 467 again. ;) At the moment, though, we're concentrating on getting v3.7.0 as bug-free as possible, so feel free to research where the PEP is now and go through the last discussion, but you should wait for the 3.7.0 release before actively bringing the PEP discussion back to python-dev. -- ~Ethan~

On 28 February 2018 at 03:15, Ethan Furman <ethan@stoneleaf.us> wrote:
One of the main developments not yet accounted for in the PEP is the fact that `memoryview` now supports efficient bytes-based iteration over arbitrary buffer-exporting objects: def iterbytes(obj): with memoryview(obj) as m: return iter(m.cast('c')) This means that aspect of PEP 467 will need to lean more heavily on discoverability arguments (since the above approach isn't obvious at all unless you're already very familiar with the use of `memoryview`), since the runtime benefit from avoiding the upfront cost of allocating and initialising two memoryview objects by using a custom iterator type instead is likely to be fairly small. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 02/21/2018 11:55 AM, Elias Zamaria wrote:
At this point the PEP itself has not been approved, and is undergoing changes. I don't see anything happening with it right now while 3.7 is going through it's final stages to release. Once 3.7.0 is published we can come back to this. -- ~Ethan~

On Wed, Feb 21, 2018 at 12:21 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
well, it was originally targeted for 3.5, so it did need kick to get going. Probably too late for 3.7.0, but no reason not to get it moving if there is support, and aren't objections. Anyone know if it paused to controversy or just lack of momentum? From a quick search, it looks like discussion simply petered out in Sept 2016. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 22 February 2018 at 08:35, Guido van Rossum <guido@python.org> wrote:
It's too late for 3.7 period, but there's no reason it can't be considered for 3.8.
Something else the PEP needs is a new champion - my original interest was to help lower barriers to Python 3 migration, but it's now more about the general ergonmics of the bytes type, and I don't do enough low level protocol work these days to have a strong opinion on that. That new champion could be Elias, or else perhaps Ethan Furman (who drove the last round of proposed updates to the PEP, which unfortunately don't appear to have been submitted to the PEPs repo: https://mail.python.org/pipermail/python-dev/2016-September/146043.html) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Feb 21, 2018 at 11:55 AM, Elias Zamaria <mikez302@gmail.com> wrote:
This is about some minor changes to the bytes, bytearray, and memoryview classes. Here is the PEP: https://www.python.org/dev/peps/pep-0467/
+1 all around. One other thought: : Addition of optimised iterator methods that produce bytes objects Maybe it would make sense to have a "byte" type that holds a single byte. It would be an integer that could only hold values from 0-255. Then the regular iterator could simply return a bunch of single byte objects. I can't say I've thought it through, but if a byte is a int with restricted range, then it could act like an int in (almost?) every context, so there would be no need for a separate iterator. I also haven't thought through whether there is any real advantage to having such a type -- but off the top of my head, making a distinction between a bytes object that happens to be length-one and a single byte could be handy. I sure do often wish for a character object. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I think the chances of a "byte" object are about as good as the chances of a character object (though one can always implement such in C extensions, that wouldn't build them into the syntax). The fact that characters are single-byte strings is responsible for certain anomalies with (e.g.) the __contains__ operator (list elements aren't lists, but string element are strings), but overall the choices made lead to sensible, comprehensible code. regards Steve Steve Holden On Wed, Feb 21, 2018 at 8:26 PM, Chris Barker <chris.barker@noaa.gov> wrote:

On Wed, Feb 21, 2018 at 12:39 PM, Steve Holden <steve@holdenweb.com> wrote:
I think the chances of a "byte" object are about as good as the chances of a character object
probably right.
(though one can always implement such in C extensions, that wouldn't build them into the syntax).
I think you could simply subclass, too (overriding __new__ and a couple methods). But that would do exactly no good, unless you used your own custom string and bytes objects, too. The whole point is that iterating over a string (Or bytes) always returns an also-iterable object, ad-infinitum. This is the cause of the major remaining common "type error" in Python. (the old integer division used to be the big one)
I'm pretty convinced that the choice not to have a character type has had basically zero benefits to sensible, comprehensible code, though it's not a very big deal, either. not a big enough deal for the churn it would cause to introduce it now, that's for sure. so +1 for this PEP as is. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Nick, I'm trying to reply to your message, but I can't figure out how. You mentioned that the PEP needs a "champion". What would that involve? How much time and effort would it take? What kinds of decisions would I make? The iterbytes thing in the PEP is something I was wishing for, while working on a personal project. I stumbled upon this PEP and decided to try to implement it myself, to learn about C and the Python internals, among other reasons. I don't know how I would feel working on something so general, of use to so many people for lots of different purposes. Do I know enough about all of the use cases and what everyone wants? I am not completely against it but I'd need to think about it. On Wed, Feb 21, 2018 at 2:36 PM, Chris Barker <chris.barker@noaa.gov> wrote:

On 02/26/2018 11:34 PM, Elias Zamaria wrote:
Being a PEP "champion" involves collecting lots of data, sorting it, making decisions about API design, posting about those decisions along with the pros and cons, listening to more feedback, and continuing until there is general agreement about the PEP. After that a request is made to (usually) Guido to accept or reject the PEP. If accepted, then the code writing stage happens. Writing code first is not bad as a working proof-of-concept is always handy.
The iterbytes thing in the PEP is something I was wishing for, while working on a personal project. I stumbled upon this PEP and decided to try to implement it myself, to learn about C and the Python internals, among other reasons.
It's a good way to go about it!
Part of the PEP writing process is asking for and collecting use-cases; if possible, looking at other code projects for use-cases is also useful. Time needed can vary widely depending on the subject; if I recall correctly, PEP 409 only took a few days, while PEP 435 took several weeks. PEP 467 has already gone through a few iterations, so hopefully not too much more time is required. If you would like to try you'll get plenty of help from the community -- at least from those willing to go through PEP 467 again. ;) At the moment, though, we're concentrating on getting v3.7.0 as bug-free as possible, so feel free to research where the PEP is now and go through the last discussion, but you should wait for the 3.7.0 release before actively bringing the PEP discussion back to python-dev. -- ~Ethan~

On 28 February 2018 at 03:15, Ethan Furman <ethan@stoneleaf.us> wrote:
One of the main developments not yet accounted for in the PEP is the fact that `memoryview` now supports efficient bytes-based iteration over arbitrary buffer-exporting objects: def iterbytes(obj): with memoryview(obj) as m: return iter(m.cast('c')) This means that aspect of PEP 467 will need to lean more heavily on discoverability arguments (since the above approach isn't obvious at all unless you're already very familiar with the use of `memoryview`), since the runtime benefit from avoiding the upfront cost of allocating and initialising two memoryview objects by using a custom iterator type instead is likely to be fairly small. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (6)
-
Chris Barker
-
Elias Zamaria
-
Ethan Furman
-
Guido van Rossum
-
Nick Coghlan
-
Steve Holden