PEP 467 feedback from the Steering Council
Hello Nick, Ethan, The Python Steering Council reviewed PEP 467 -- Minor API improvements for binary sequences at our 2021-07-26 meeting. Thank you for work on this PEP. We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP. We have some requests for changes that we’d like you to consider. * The Python-Version in the PEP needs to target Python 3.11 of course. * We think it would be better if bytes.fromsize()’s second argument was a keyword-enabled or keyword-only argument. We understand the rationale given in the PEP for not doing so, but ultimately we think the readability of (at least allowing) a keyword argument to be more compelling. Some possible options include `fill`, `value`, or `byte`. * We all really dislike the word “ord” as in `bytes.fromord()`. We understand the symmetry of this choice, but we also feel like we have an opportunity to make it more understandable, so we recommend `bytes.fromint()` and `bytearray.fromint()`. * We think the `bchr()` built-in is not necessary. Given the `.fromint()` methods, it’s better not to duplicate the functionality, and everything has a cost. A built-in that exists only for the symmetry described in the PEP is just a little extra complexity for little value. Let us know what you think about making these changes. We aren’t making acceptance contingent on these changes, but we do think they make the PEP and the new APIs better. -Barry (on behalf of the Python Steering Council)
On Fri, 30 Jul 2021, 8:47 am Barry Warsaw, <barry@python.org> wrote:
Hello Nick, Ethan,
The Python Steering Council reviewed PEP 467 -- Minor API improvements for binary sequences at our 2021-07-26 meeting.
Thank you for work on this PEP. We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP.
Thank you!
Let us know what you think about making these changes. We aren’t making acceptance contingent on these changes, but we do think they make the PEP and the new APIs better.
Those changes all sound reasonable to me, so if Ethan is also amenable, I think we should incorporate them. Cheers, Nick.
-Barry (on behalf of the Python Steering Council)
Thanks Nick. Ethan, what do you think? -Barry
On Jul 29, 2021, at 16:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Fri, 30 Jul 2021, 8:47 am Barry Warsaw, <barry@python.org> wrote:
Hello Nick, Ethan,
The Python Steering Council reviewed PEP 467 -- Minor API improvements for binary sequences at our 2021-07-26 meeting.
Thank you for work on this PEP. We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP.
Thank you!
Let us know what you think about making these changes. We aren’t making acceptance contingent on these changes, but we do think they make the PEP and the new APIs better.
Those changes all sound reasonable to me, so if Ethan is also amenable, I think we should incorporate them.
Cheers, Nick.
-Barry (on behalf of the Python Steering Council)
On 7/29/21 3:46 PM, Barry Warsaw wrote:
We’re generally very favorable for adding to Python 3.11 the features and APIs described in the PEP. We have some requests for changes that we’d like you to consider.
* The Python-Version in the PEP needs to target Python 3.11 of course.
Done.
* We think it would be better if bytes.fromsize()’s second argument was a keyword-enabled or keyword-only argument. We understand the rationale given in the PEP for not doing so, but ultimately we think the readability of (at least allowing) a keyword argument to be more compelling. Some possible options include `fill`, `value`, or `byte`.
Done, went with "fill" as an optional keyword argument.
* We all really dislike the word “ord” as in `bytes.fromord()`. We understand the symmetry of this choice, but we also feel like we have an opportunity to make it more understandable, so we recommend `bytes.fromint()` and `bytearray.fromint()`.
* We think the `bchr()` built-in is not necessary. Given the `.fromint()` methods, it’s better not to duplicate the functionality, and everything has a cost. A built-in that exists only for the symmetry described in the PEP is just a
Done. little extra complexity for little value. I would rather keep `bchr` and lose the `.fromint()` methods. To get bytes: some_var = bchr(65) vs some_var = bytes.fromint(65) and for bytearrays some_var = bytearray(bchr(65)) vs some_var = bytearray.from_int(65) Let me know if I should drop `.fromint()`. -- ~Ethan~
Thanks for responding Ethan.
On Aug 3, 2021, at 10:48, Ethan Furman <ethan@stoneleaf.us> wrote:
I would rather keep `bchr` and lose the `.fromint()` methods.
To get bytes:
some_var = bchr(65) vs some_var = bytes.fromint(65)
and for bytearrays
some_var = bytearray(bchr(65)) vs some_var = bytearray.from_int(65)
Can you provide some rationale for why you prefer bchr() over .fromint()? Cheers, -Barry
On 8/3/21 1:19 PM, Barry Warsaw wrote:
Can you provide some rationale for why you prefer bchr() over .fromint()?
- `bchr` directly corresponds with `chr` - `str` has no `fromint` - `bytearray(bchr(int))` is roughly the same as `bytearray.fromint(int)`, but `bchr(int)` for a bytes object is much nicer that `bytes.fromint(int)` - possible confusion between `fromsize` and `fromint` (I know it would get me from time to time) -- ~Ethan~
On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I would rather keep `bchr` and lose the `.fromint()` methods.
I would prefer to only have a bytes.byte(65) method, no bchr() built-in function. I would prefer to keep builtins namespace as small as possible. bytes.byte() name is similar to bytes.getbyte(). I cannot find "int" in the name of other bytes methods.
some_var = bytearray(bchr(65)) vs some_var = bytearray.from_int(65)
bytearray(bchr(65)) sounds less efficient. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Aug 4, 2021, at 07:31, Victor Stinner <vstinner@python.org> wrote:
On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I would rather keep `bchr` and lose the `.fromint()` methods.
I would prefer to only have a bytes.byte(65) method, no bchr() built-in function. I would prefer to keep builtins namespace as small as possible.
The Steering Council is also pretty adamantly against adding a new bchr() built-in.
bytes.byte() name is similar to bytes.getbyte(). I cannot find "int" in the name of other bytes methods.
.byte() seems fine to me too. I’m not a fan of smushedwords but .fromint() seemed better than .fromord(). -Barry
I see in the PEP: "the bchr builtin is to recreate the ord/chr/unichr trio from Python 2 under a different naming scheme" Why recreate that trio? Shouldn't we be moving away from the bytes-is-a-string concept here? A byte is not a character -- why would the function that creates a byte from an integer value be called bchr()? (short for "byte character", presumably) There are fewer and fewer people having to translate their code (or their brains) from py2 to py3. bytes.fromint() is just fine. -CHB BTW -- I really love the rest of the PEP -- it's been too awkward to work with bytes for too long. On Wed, Aug 4, 2021 at 9:43 AM Barry Warsaw <barry@python.org> wrote:
On Aug 4, 2021, at 07:31, Victor Stinner <vstinner@python.org> wrote:
On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman <ethan@stoneleaf.us> wrote:
I would rather keep `bchr` and lose the `.fromint()` methods.
I would prefer to only have a bytes.byte(65) method, no bchr() built-in function. I would prefer to keep builtins namespace as small as possible.
The Steering Council is also pretty adamantly against adding a new bchr() built-in.
bytes.byte() name is similar to bytes.getbyte(). I cannot find "int" in the name of other bytes methods.
.byte() seems fine to me too. I’m not a fan of smushedwords but .fromint() seemed better than .fromord().
-Barry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CPZTRWIW... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Christopher Barker writes:
A byte is not a character
While I am -0.5 on bchr for many of the reasons already cited in the thread (and would be -1 if the methods names proposed for the feature were a bit more aesthetic), I don't think this argument is valid. Bytes that could otherwise be arbitrary (aka "magic numbers") are *often* chosen because they correspond to the ASCII repertoire. And strings is still a useful utility for C programmers, even if not so much for others. It's true that bytes are still bytes, characters are still characters, and it's a very good thing from my point of view that Python 3 gave us a consistent separation -- the only thing I ever explicitly use bytes for is passwords for zipfiles, and the implicit handling of bytes ontherwise just works for me :-). But it turns out it was a mistake to make it so hard for consenting adults to treat bytes as characters in certain contexts (for example, PEP 461 -- note: I opposed that PEP and I was wrong -- should have been part of Python 3.0). Steve
On Fri, 6 Aug 2021 01:37:48 +0900 "Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Christopher Barker writes:
A byte is not a character
While I am -0.5 on bchr for many of the reasons already cited in the thread (and would be -1 if the methods names proposed for the feature were a bit more aesthetic), I don't think this argument is valid. Bytes that could otherwise be arbitrary (aka "magic numbers") are *often* chosen because they correspond to the ASCII repertoire. And strings is still a useful utility for C programmers, even if not so much for others.
In what context is `bchr()` useful? Regards Antoine.
Antoine Pitrou writes:
In what context is `bchr()` useful?
As a builtin, not my problem, I'm not the proponent. As a facility with *some* spelling, it's convenient in contexts where chr() is, but much less so (eg, coding ROT13 ;-). I know I've used this translation in mail hacking, but I don't recall whether the code was Python or Lisp. Regards, Steve
On Fri, Aug 6, 2021 at 12:23 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
As a builtin, not my problem, I'm not the proponent. As a facility with *some* spelling, it's convenient in contexts where chr() is, but much less so (eg, coding ROT13 ;-). I know I've used this translation in mail hacking, but I don't recall whether the code was Python or Lisp.
Stephen does not advocate bchr() as a built-in or library function, but he just gave a great reason why it should not be a built-in: it's hard to find compelling and common use cases. A built-ins should be one or more of: - extremely useful in daily coding, like len() and list() etc. - foundational, like next(), classmethod() and iter() etc. - hard to create in Python code, like breakpoint(), compile() etc. super() is an example that fits all of those groups. bchr() (or whatever name it might have) fits none. Cheers, Luciano
Regards, Steve
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/VBZBMCPQ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Technical Principal at ThoughtWorks | Twitter: @ramalhoorg
Barry Warsaw wrote:
On Aug 4, 2021, at 07:31, Victor Stinner vstinner@python.org wrote:
On Tue, Aug 3, 2021 at 7:54 PM Ethan Furman ethan@stoneleaf.us wrote: I would rather keep `bchr` and lose the `.fromint()` methods. I would prefer to only have a bytes.byte(65) method, no bchr() built-in function. I would prefer to keep builtins namespace as small as possible. The Steering Council is also pretty adamantly against adding a new bchr() built-in.
FYI the PEP still mentions `bchr`. -Brett
bytes.byte() name is similar to bytes.getbyte(). I cannot find "int" in the name of other bytes methods. .byte() seems fine to me too. I’m not a fan of smushedwords but .fromint() seemed better than .fromord(). -Barry
I would rather keep `bchr` and lose the `.fromint()` methods.
For me, "bchr" isn't a readable name. If I expand mentally expand it to "byte_character", it becomes an oxymoron that opposes what we try teach about bytes and characters being different things. Can you show examples in existing code of how this would be used? I'm unclear on how frequently users need to create a single byte from an integer. For me, it is very rare. Perhaps once in a large program will I search for a record separator in binary data. I would prefer to write it as: RS = byte.fromint(30) ... i = data.index(RS, start) ... if RS in data: Having this as bchr() wouldn't make the code better because it is less explicit about turning an integer into a byte. Also, it doesn't look nice when in-lined without giving it a variable name: i = data.index(bchr(30), start) # Yuck ... if bchr(30) in data: # Yuck Also keep in mind that we already have a way to spell it, "bytes([30])", so any new way needs to significantly add more clarity. I think bytes.fromint() does that. The number of use cases also matters. The bar for adding a new builtin function is very high. Raymond
On 09/09/2021 18:54, raymond.hettinger@gmail.com wrote:
I would rather keep `bchr` and lose the `.fromint()` methods. For me, "bchr" isn't a readable name. If I expand mentally expand it to "byte_character", it becomes an oxymoron that opposes what we try teach about bytes and characters being different things.
Can you show examples in existing code of how this would be used? I'm unclear on how frequently users need to create a single byte from an integer. For me, it is very rare. It's probably rare, but recently I was converting from Python 2 to Python 3 a program that implements a data compression algorithm (LZW). It (now) constructs the compressed data as a bytes object. Sometimes it needs to add a series of bytes, sometimes a single byte converted from an int. Not knowing if there was a "recommended" way to do the latter, I found something that worked, viz. bytes((i,)) But it felt a bit of a kludge. I would have used something like bchr if (a) it existed (b) I knew about it. Rob Cliffe
I recommend removing the "discouragement" from writing "bytes(10)". That is merely stylistic. As long as we support the API, it is valid Python. In the contexts where it is currently used, it tends to be clear about what it is doing: buffer = bytearray(bufsize). That doesn't need to be discouraged. Also, I concur the with SC comment that the singular of bytearray() or bytes() is byte(), not bchr(). Practically what people want here is an efficient literal that is easier to write than: b'\x1F'. I don't think bchr() meets that need. Neither bchr(0x1f) or bytearray.fromint(0x1f) are fast (not a literal) nor are they easier to read or type. The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases. This is disappointing because often the only reasonable way to manipulate binary data is with bytearrays. A user could switch to array.array() or a numpy.array, but that is unnecessarily inconvenient given that we already have a nice builtin type that means the need (for images, crypto hashes, compression, bloom filters, or anything where a C programmer would an array of unsigned chars). Given that bytes/bytearray is already an uncomfortable hybrid of string and list APIs for binary data, I don't think the competing views and APIs will be disentangled by adding methods that duplicate functionality that already exists. Instead, I recommend that the PEP focus on one or two cases where methods could be added that simplify any common tasks that are currently awkward. For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases.
I didn't read it that way, but if so, please no, I"d rather see the former use cases discouraged. ISTM that the Py2 string handling is still needed for working with mixed binary / text data -- but that should be a pretty specialized use case. spelling the way to create a byte, byte() sure makes more sense in any other context.
... anything where a C programmer would an array of unsigned chars).
or any programmer would use an array of unsigned 8bit integers :-) numpy spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. My point is that for anyone not an "old time" C programmer, or even a Python2 programmer, the "character is an unsigned 8 bit int" concept is alien and confusing, not a helpful mnemonic.
For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
no, though bytes([31]) isn't horrible ;-) (despite coding for over four decades, I'm still not comfortable with hex notation) I say it's not horrible, because bytes is a Sequence of bytes (or integer values between 0 and 255), initializing it with an iterable seems pretty reasonable, that's how we initialize most (all?) other sequences after all. And compatible with array.array and numpy arrays. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases.
I didn't read it that way, but if so, please no, I"d rather see the former use cases discouraged. ISTM that the Py2 string handling is still needed for working with mixed binary / text data -- but that should be a pretty specialized use case. spelling the way to create a byte, byte() sure makes more sense in any other context.
... anything where a C programmer would an array of unsigned chars).
or any programmer would use an array of unsigned 8bit integers :-) numpy spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. My point is that for anyone not an "old time" C programmer, or even a Python2 programmer, the "character is an unsigned 8 bit int" concept is alien and confusing, not a helpful mnemonic.
For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
no, though bytes([31]) isn't horrible ;-) (despite coding for over four decades, I'm still not comfortable with hex notation)
I say it's not horrible, because bytes is a Sequence of bytes (or integer values between 0 and 255), initializing it with an iterable seems pretty reasonable, that's how we initialize most (all?) other sequences after all. And compatible with array.array and numpy arrays.
I consider bytes([31]) notation to be horrible API design because a simple easy to make typo of omitting the [] or using () and forgetting the tupleizing comma turns it into a different valid call with an entirely different meaning. bytes([31]) vs bytes((31)) vs bytes(31). It's also ugly to anyone who thinks about what bytecode is generated and executed in order to do it. an entire new list object with a single element referring to a tiny int is created and destroyed just to create a b'\037' object? An optimizer pass to fix that up at the bytecode level isn't easy as it can only be done when it can prove that `bytes` has not been reassigned to something other than the builtin. Near impossible in a lot of code. bytes.fromint(31) isn't much better in the bytecode regard, but at least a temporary list is not being created. As much as I think that bytes(size: int) was a bad idea to have as an API - bytearray(size: int) is fine and useful as it is mutable - that ship sailed and getting rid of it would break some odd code. It doesn't have much use, so adding fromsize(size: int) methods don't sound very compelling as it just adds yet another way to do the same thing. we should just live with that specific wart. `bchr` as a builtin... I'm with the others on saying no to any new builtin that isn't expected to see frequent use. bchr won't see frequent use. `bytes.fromint` seems fine. others are proposing `bytes.byte` for that. I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read. It falls victim to API stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a reader in English as the difference is a subtle "s". "bytes dot from int" or "bytes from int" is quite clear. (avoiding stuttering in API design was popularized by golang - it's a good thing to strive for in any language) It's times like this that i wish Python had chosen consistent camelCase, CapWords, or snake_case in all API names as conjoinedwords aren't great. But they are sadly consistent with our past sins. One thing never mentioned in the PEP. If you expect a primary use of the fromint (aka bchr builtin that isn't going to happen) to be called on constant values often. Why are we adding name lookups and function calls to this? Why not address the elephant in the room and allow for decimal values to be written as an escape sequence within bytes literals? b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255 with no leading zero should be accepted when parsing such an escape. (Do not bother adding the same feature for codepoints in unicode strs; leave that to later if someone shows actual demand). This can't address the bytearray need, but that's been true of bytearray for ages, a common way to create them is via a copy from transient bytes objects. bytearray(b'\d31') isn't much different than bytearray.fromint(31). one less name lookup. Why not add a \d escape? Introducing a new escape is fraught with peril as existing \d's within b'' literals in code could change meaning. backwards compatibility fail. But one that is easy to check for with a DeprecationWarning for a few releases... The new literal parsing could be enabled per-file with a __future__ import. -gps
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4G... Code of Conduct: http://python.org/psf/codeofconduct/
Hm, I don’t think the major use for bchr() will be with a constant. On Sun, Aug 22, 2021 at 14:48 Gregory P. Smith <greg@krypto.org> wrote:
On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases.
I didn't read it that way, but if so, please no, I"d rather see the former use cases discouraged. ISTM that the Py2 string handling is still needed for working with mixed binary / text data -- but that should be a pretty specialized use case. spelling the way to create a byte, byte() sure makes more sense in any other context.
... anything where a C programmer would an array of unsigned chars).
or any programmer would use an array of unsigned 8bit integers :-) numpy spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. My point is that for anyone not an "old time" C programmer, or even a Python2 programmer, the "character is an unsigned 8 bit int" concept is alien and confusing, not a helpful mnemonic.
For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
no, though bytes([31]) isn't horrible ;-) (despite coding for over four decades, I'm still not comfortable with hex notation)
I say it's not horrible, because bytes is a Sequence of bytes (or integer values between 0 and 255), initializing it with an iterable seems pretty reasonable, that's how we initialize most (all?) other sequences after all. And compatible with array.array and numpy arrays.
I consider bytes([31]) notation to be horrible API design because a simple easy to make typo of omitting the [] or using () and forgetting the tupleizing comma turns it into a different valid call with an entirely different meaning. bytes([31]) vs bytes((31)) vs bytes(31).
It's also ugly to anyone who thinks about what bytecode is generated and executed in order to do it. an entire new list object with a single element referring to a tiny int is created and destroyed just to create a b'\037' object? An optimizer pass to fix that up at the bytecode level isn't easy as it can only be done when it can prove that `bytes` has not been reassigned to something other than the builtin. Near impossible in a lot of code. bytes.fromint(31) isn't much better in the bytecode regard, but at least a temporary list is not being created.
As much as I think that bytes(size: int) was a bad idea to have as an API - bytearray(size: int) is fine and useful as it is mutable - that ship sailed and getting rid of it would break some odd code. It doesn't have much use, so adding fromsize(size: int) methods don't sound very compelling as it just adds yet another way to do the same thing. we should just live with that specific wart.
`bchr` as a builtin... I'm with the others on saying no to any new builtin that isn't expected to see frequent use. bchr won't see frequent use.
`bytes.fromint` seems fine. others are proposing `bytes.byte` for that. I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read. It falls victim to API stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a reader in English as the difference is a subtle "s". "bytes dot from int" or "bytes from int" is quite clear. (avoiding stuttering in API design was popularized by golang - it's a good thing to strive for in any language) It's times like this that i wish Python had chosen consistent camelCase, CapWords, or snake_case in all API names as conjoinedwords aren't great. But they are sadly consistent with our past sins.
One thing never mentioned in the PEP. If you expect a primary use of the fromint (aka bchr builtin that isn't going to happen) to be called on constant values often. Why are we adding name lookups and function calls to this? Why not address the elephant in the room and allow for decimal values to be written as an escape sequence within bytes literals?
b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255 with no leading zero should be accepted when parsing such an escape. (Do not bother adding the same feature for codepoints in unicode strs; leave that to later if someone shows actual demand). This can't address the bytearray need, but that's been true of bytearray for ages, a common way to create them is via a copy from transient bytes objects. bytearray(b'\d31') isn't much different than bytearray.fromint(31). one less name lookup.
Why not add a \d escape? Introducing a new escape is fraught with peril as existing \d's within b'' literals in code could change meaning. backwards compatibility fail. But one that is easy to check for with a DeprecationWarning for a few releases... The new literal parsing could be enabled per-file with a __future__ import.
-gps
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4G... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
On Sun, 22 Aug 2021 16:08:56 -0700 Guido van Rossum <guido@python.org> wrote:
Hm, I don’t think the major use for bchr() will be with a constant.
What would be the major use for bchr()? I don't think I've ever regretted its absence. Regards Antoine.
On Sun, Aug 22, 2021 at 14:48 Gregory P. Smith <greg@krypto.org> wrote:
On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
The history of bytes/bytearray is a dual-purpose view. It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion). It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints). ISTM that the authors of this PEP reject or want to discourage the latter use cases.
I didn't read it that way, but if so, please no, I"d rather see the former use cases discouraged. ISTM that the Py2 string handling is still needed for working with mixed binary / text data -- but that should be a pretty specialized use case. spelling the way to create a byte, byte() sure makes more sense in any other context.
... anything where a C programmer would an array of unsigned chars).
or any programmer would use an array of unsigned 8bit integers :-) numpy spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. My point is that for anyone not an "old time" C programmer, or even a Python2 programmer, the "character is an unsigned 8 bit int" concept is alien and confusing, not a helpful mnemonic.
For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.
no, though bytes([31]) isn't horrible ;-) (despite coding for over four decades, I'm still not comfortable with hex notation)
I say it's not horrible, because bytes is a Sequence of bytes (or integer values between 0 and 255), initializing it with an iterable seems pretty reasonable, that's how we initialize most (all?) other sequences after all. And compatible with array.array and numpy arrays.
I consider bytes([31]) notation to be horrible API design because a simple easy to make typo of omitting the [] or using () and forgetting the tupleizing comma turns it into a different valid call with an entirely different meaning. bytes([31]) vs bytes((31)) vs bytes(31).
It's also ugly to anyone who thinks about what bytecode is generated and executed in order to do it. an entire new list object with a single element referring to a tiny int is created and destroyed just to create a b'\037' object? An optimizer pass to fix that up at the bytecode level isn't easy as it can only be done when it can prove that `bytes` has not been reassigned to something other than the builtin. Near impossible in a lot of code. bytes.fromint(31) isn't much better in the bytecode regard, but at least a temporary list is not being created.
As much as I think that bytes(size: int) was a bad idea to have as an API - bytearray(size: int) is fine and useful as it is mutable - that ship sailed and getting rid of it would break some odd code. It doesn't have much use, so adding fromsize(size: int) methods don't sound very compelling as it just adds yet another way to do the same thing. we should just live with that specific wart.
`bchr` as a builtin... I'm with the others on saying no to any new builtin that isn't expected to see frequent use. bchr won't see frequent use.
`bytes.fromint` seems fine. others are proposing `bytes.byte` for that. I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read. It falls victim to API stuttering. "bytes dot byte" or "bytes byte" doesn't convey much to a reader in English as the difference is a subtle "s". "bytes dot from int" or "bytes from int" is quite clear. (avoiding stuttering in API design was popularized by golang - it's a good thing to strive for in any language) It's times like this that i wish Python had chosen consistent camelCase, CapWords, or snake_case in all API names as conjoinedwords aren't great. But they are sadly consistent with our past sins.
One thing never mentioned in the PEP. If you expect a primary use of the fromint (aka bchr builtin that isn't going to happen) to be called on constant values often. Why are we adding name lookups and function calls to this? Why not address the elephant in the room and allow for decimal values to be written as an escape sequence within bytes literals?
b'\d31' for example to say "decimal byte 31". Proposal: Only values 0-255 with no leading zero should be accepted when parsing such an escape. (Do not bother adding the same feature for codepoints in unicode strs; leave that to later if someone shows actual demand). This can't address the bytearray need, but that's been true of bytearray for ages, a common way to create them is via a copy from transient bytes objects. bytearray(b'\d31') isn't much different than bytearray.fromint(31). one less name lookup.
Why not add a \d escape? Introducing a new escape is fraught with peril as existing \d's within b'' literals in code could change meaning. backwards compatibility fail. But one that is easy to check for with a DeprecationWarning for a few releases... The new literal parsing could be enabled per-file with a __future__ import.
-gps
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4G... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM... Code of Conduct: http://python.org/psf/codeofconduct/
I’m finally getting back around to this thread. I’d like to see some resolution to the bchr/fromint question, since it seems like that’s the last thing holding up approval of the PEP. And the PEP has other useful additions that I’d like to see in Python 3.11. On Aug 22, 2021, at 16:08, Guido van Rossum <guido@python.org> wrote:
Hm, I don’t think the major use for bchr() will be with a constant.
Perhaps. I think Greg’s idea has merit anyway, but it doesn’t *have* to be tied to PEP 467. I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider? Cheers, -Barry
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?
I haven't been completely keeping up with the entire thread, so apologies if this has already been covered. I assume that the idea is that bytes.fromint should return a single byte, equivalent to chr() returning a single character. To me, it sounds like should be the opposite of int.from_bytes. >>> int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world' If that's not the API being suggested, that's going to be confusing. How about bytes.bchr()? bytes.bchr(n) --> a single byte bytes.from_int(n, byteorder) --> one or more bytes Personally, I think I would use the one or more bytes version more then the single bchr version, so if we only had one, I vote for that. -- Steve
On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
>>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
Really? I don't know anyone serializing strings as a "bigint" number. Did you already see such code pattern in the wild? Usually, bytes are serialized as... bytes, no? Sometimes, bytes are serialized as base64 or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I don' recall any file format serializing bytes as a single large decimal number. Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Wed, Sep 8, 2021 at 10:42 PM Victor Stinner <vstinner@python.org> wrote:
On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
>>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
Really? I don't know anyone serializing strings as a "bigint" number. Did you already see such code pattern in the wild? Usually, bytes are serialized as... bytes, no? Sometimes, bytes are serialized as base64 or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I don' recall any file format serializing bytes as a single large decimal number.
I've seen it, in various places. There are certain protocols in which the distinction between a number and a byte sequence is immaterial (for instance, the FOURCC identifier in an IFF family file such as a .wav - the signature 'WAVE' is identically considered to be the number 0x57415645). Being able to convert between the numeric and character forms of the same identifier is convenient. ChrisA
On 2021-09-08 13:37, Victor Stinner wrote:
On Wed, Sep 8, 2021 at 7:46 AM Steven D'Aprano <steve@pearwood.info> wrote:
>>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
Really? I don't know anyone serializing strings as a "bigint" number. Did you already see such code pattern in the wild? Usually, bytes are serialized as... bytes, no? Sometimes, bytes are serialized as base64 or hexadecimal to go through into an ASCII ("7-bit") bytestream. But I don' recall any file format serializing bytes as a single large decimal number.
Well, we already have int.from_bytes. What's that used for? Adding the opposite conversion does make sense to me. If the number is 0..255, and maybe the byteorder can be omitted in that case, then it seems like a reasonable solution to me.
On 9/7/21 10:39 PM, Steven D'Aprano wrote:
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?
I haven't been completely keeping up with the entire thread, so apologies if this has already been covered. I assume that the idea is that bytes.fromint should return a single byte, equivalent to chr() returning a single character.
To me, it sounds like should be the opposite of int.from_bytes.
>>> int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
That certainly makes sense to me. At this point, the only reason that would not work is an arbitrary limit of 255 on the input, and the only reason that limit is there is to have `bchr` be the inverse of `ord`. Since `bchr` isn't going to happen, I see no reason to have the 255 limit. `byteorder` can default to None with a requirement of being set when the integer is over 255. -- ~Ethan~
On Thu, 9 Sept 2021 at 01:46, Ethan Furman <ethan@stoneleaf.us> wrote:
On 9/7/21 10:39 PM, Steven D'Aprano wrote:
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?
I haven't been completely keeping up with the entire thread, so apologies if this has already been covered. I assume that the idea is that bytes.fromint should return a single byte, equivalent to chr() returning a single character.
To me, it sounds like should be the opposite of int.from_bytes.
>>> int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
That certainly makes sense to me. At this point, the only reason that would not work is an arbitrary limit of 255 on the input, and the only reason that limit is there is to have `bchr` be the inverse of `ord`. Since `bchr` isn't going to happen, I see no reason to have the 255 limit. `byteorder` can default to None with a requirement of being set when the integer is over 255.
I've posted a PR removing bchr from the proposal: https://github.com/python/peps/pull/2068/files `bytes.fromint` is still the inverse of `ord` for bytes objects, even without the `bchr` builtin alias. The spelling of the trio is just `ord`/`bytes.fromint`/`chr` rather than `ord`/`bchr`/`chr`. The fact the method throws an exception for integers that won't fit in a single byte is an input data validation feature, not an undesirable limitation. As Brandt already noted, we don't need a new general purpose int to bytes converter as `int.to_bytes` already has that covered. Cheers, Nick. P.S. The fact that it *didn't* look like the inverse operation for `int.from_bytes` was one advantage of calling the method `bytes.fromord` instead of `bytes.fromint`, but I'm still happy the SC is right that `bytes.fromint` is a more comprehensible method name overall. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, 9 Sep 2021 18:55:04 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. The fact that it *didn't* look like the inverse operation for `int.from_bytes` was one advantage of calling the method `bytes.fromord` instead of `bytes.fromint`, but I'm still happy the SC is right that `bytes.fromint` is a more comprehensible method name overall.
Perhaps we can call it `bytes.byte` to make it unambiguous? Regards Antoine.
I proposed bytes.byte earlier in this thread: https://mail.python.org/archives/list/python-dev@python.org/message/KBVVBJL2... Gregory dislikes the name: "I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read". https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM... That's why I proposed: bytes.fromchar(). I still like bytes.byte() :-) Victor On Thu, Sep 9, 2021 at 11:07 AM Antoine Pitrou <antoine@python.org> wrote:
On Thu, 9 Sep 2021 18:55:04 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. The fact that it *didn't* look like the inverse operation for `int.from_bytes` was one advantage of calling the method `bytes.fromord` instead of `bytes.fromint`, but I'm still happy the SC is right that `bytes.fromint` is a more comprehensible method name overall.
Perhaps we can call it `bytes.byte` to make it unambiguous?
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WZUPBP4U... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Thu, 9 Sep 2021 12:06:49 +0200 Victor Stinner <vstinner@python.org> wrote:
I proposed bytes.byte earlier in this thread: https://mail.python.org/archives/list/python-dev@python.org/message/KBVVBJL2...
Gregory dislikes the name: "I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read". https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM...
That's why I proposed: bytes.fromchar(). I still like bytes.byte() :-)
Well, the proposed function converts *from* an integer *to* a byte "character". But the term character is a bit unfortunate here as well, since characters in Python are Unicode. Regards Antoine.
It probably won't fly but why not bytes.frombyte? There's no such thing as a byte type in Python, only bytes, so I want to argue it makes it clear the argument is a number in the range 0..255 and the result is a bytes object containing this single byte value. Tentatively, Arnaud PS. But truly I feel like this method is superfluous. On Thu, 9 Sept 2021 at 11:11, Victor Stinner <vstinner@python.org> wrote:
I proposed bytes.byte earlier in this thread: https://mail.python.org/archives/list/python-dev@python.org/message/KBVVBJL2...
Gregory dislikes the name: "I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read". https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM...
That's why I proposed: bytes.fromchar(). I still like bytes.byte() :-)
Victor
On Thu, Sep 9, 2021 at 11:07 AM Antoine Pitrou <antoine@python.org> wrote:
On Thu, 9 Sep 2021 18:55:04 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. The fact that it *didn't* look like the inverse operation for `int.from_bytes` was one advantage of calling the method `bytes.fromord` instead of `bytes.fromint`, but I'm still happy the SC is right that `bytes.fromint` is a more comprehensible method name overall.
Perhaps we can call it `bytes.byte` to make it unambiguous?
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WZUPBP4U... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6W4G32NO... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, 10 Sep 2021, 12:32 am Arnaud Delobelle, <arnodel@gmail.com> wrote:
It probably won't fly but why not bytes.frombyte?
There's no such thing as a byte type in Python, only bytes, so I want to argue it makes it clear the argument is a number in the range 0..255 and the result is a bytes object containing this single byte value.
The getbyte() and iterbytes() methods added by the PEP make a length 1 bytes object the pseudo "byte" type, just as a length 1 string is the pseudo "character" (technically "code point") type, so "frombyte()" would have the type implications backwards (it produces a "byte", it doesn't really read one). I think it's OK for "int.from_bytes", "int.to_bytes", "bytes.fromint" and "byte array.fromint" to have closely related names, since they're closely related operations. The OverflowError raised for out of bounds values in the latter two methods could even mention int.to_bytes explicitly. While the SC already accepted "fromint", the range limitation could be made more explicit by appending "byte" to give "bytes.fromintbyte" and "bytearray.fromintbyte" (thus naming a pseudo "int byte" type for integers in the range 0-255 inclusive). An advantage of this approach is to give a specific name to the kinds of values that regular indexing and iteration of bytes objects produce. The SC has already rejected "fromord" as too obscure. I'd be OK with bytes.bchr, but bytearray.bchr would look odd to me, so I don't like that option as a whole. For "bytes.byte", I have 3 objections: * I think Greg's Smith's API stuttering concerns are valid * I think it's potentially ambiguous as to whether it is an alternate constructor or an indexed access method (i.e. doing what getbyte() does in the PEP) * I don't like it as a bytearray method name Cheers, Nick.
Tentatively,
Arnaud
PS. But truly I feel like this method is superfluous.
On Thu, 9 Sept 2021 at 11:11, Victor Stinner <vstinner@python.org> wrote:
I proposed bytes.byte earlier in this thread:
https://mail.python.org/archives/list/python-dev@python.org/message/KBVVBJL2...
Gregory dislikes the name: "I don't *like* to argue over names (the last stage of anything) but I do need to point out how that sounds to read".
https://mail.python.org/archives/list/python-dev@python.org/message/DGJWM3VM...
That's why I proposed: bytes.fromchar(). I still like bytes.byte() :-)
Victor
On Thu, Sep 9, 2021 at 11:07 AM Antoine Pitrou <antoine@python.org>
wrote:
On Thu, 9 Sep 2021 18:55:04 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. The fact that it *didn't* look like the inverse operation for `int.from_bytes` was one advantage of calling the method `bytes.fromord` instead of `bytes.fromint`, but I'm still happy the
SC
is right that `bytes.fromint` is a more comprehensible method name overall.
Perhaps we can call it `bytes.byte` to make it unambiguous?
Regards
Antoine.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WZUPBP4U... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6W4G32NO... Code of Conduct: http://python.org/psf/codeofconduct/
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/P7XG5CLL... Code of Conduct: http://python.org/psf/codeofconduct/
On 9/9/21 1:55 AM, Nick Coghlan wrote:
`bytes.fromint` is still the inverse of `ord` for bytes objects, even without the `bchr` builtin alias. The spelling of the trio is just `ord`/`bytes.fromint`/`chr` rather than `ord`/`bchr`/`chr`. The fact the method throws an exception for integers that won't fit in a single byte is an input data validation feature, not an undesirable limitation.
I'm starting to think the name should be `bytes.bchr` -- it avoids any confusion with the `int.to_bytes` and `int.from_bytes` methods, and is an appropriate name for the target domain (where bytes are treated as characters). -- ~Ethan~
I fully admit serious bikeshedding here, but: I'm starting to think the name should be `bytes.bchr` -- it avoids any
confusion with the `int.to_bytes` and `int.from_bytes` methods,
are they so different? :-) In [23]: x.to_bytes(1, 'little') Out[23]: b'A' In [27]: int.from_bytes(b'A', 'little') Out[27]: 65 you can think of this as a useful specific case of the int methods. (by the way, why do I need to specify the byte order for a 1 byte int? Yes, I know, it's always a required parameter -- though that 's one reason to have the special case easily available) and is an appropriate name for the target domain (where bytes are treated
as characters).
Is that the target domain? Yes, it's an important use case, but certainly not the only one, and frankly kind of a specialized use case actually. If you are working with characters (text) in Python 3, you should be using the str type. Using bytes for general text (even if you know the text at hand is all ASCII) is not recommended. It is useful to use bytes for the specialized use case of mixed text and binary data (which, by the way, I have had to do) but I don't think we should say that particular use case is what bytes are targeted for. Anyone doing that should know what they are doing :-) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 9/9/21 8:53 AM, Christopher Barker wrote:
On 9/9/21 7:25 AM, Ethan Furman wrote:
I'm starting to think the name should be `bytes.bchr` -- it avoids any confusion with the `int.to_bytes` and `int.from_bytes` methods,
are they so different? :-)
Yes, they are. Conceptually, one is working with integers, the other with bytestrings (which is either entirely ASCII encoded strings, or a mixture of the two).
and is an appropriate name for the target domain (where bytes are treated as characters).
Is that the target domain?
Yes. PEP 467*, and PEP 461 before it, are targeting the wire format protocol domain. -- ~Ethan~ * The `fromint/bchr` portion for sure, and the other changes certainly help there although they may have wider uses. PEP 461: https://www.python.org/dev/peps/pep-0461/
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint(). So why don’t we just relax int.to_bytes()’s signature to include natural default values: int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False) Then I ought to be able to just do >>> (65).to_bytes() b’A’ and if I try to convert an integer value greater than 255, I get the same OverflowError? Seems good enough to me. -Barry
On Sep 9, 2021, at 08:53, Christopher Barker <pythonchb@gmail.com> wrote:
I fully admit serious bikeshedding here, but:
I'm starting to think the name should be `bytes.bchr` -- it avoids any confusion with the `int.to_bytes` and `int.from_bytes` methods,
are they so different? :-)
In [23]: x.to_bytes(1, 'little') Out[23]: b'A'
In [27]: int.from_bytes(b'A', 'little') Out[27]: 65
you can think of this as a useful specific case of the int methods.
(by the way, why do I need to specify the byte order for a 1 byte int? Yes, I know, it's always a required parameter -- though that 's one reason to have the special case easily available)
and is an appropriate name for the target domain (where bytes are treated as characters).
Is that the target domain? Yes, it's an important use case, but certainly not the only one, and frankly kind of a specialized use case actually. If you are working with characters (text) in Python 3, you should be using the str type.
Using bytes for general text (even if you know the text at hand is all ASCII) is not recommended. It is useful to use bytes for the specialized use case of mixed text and binary data (which, by the way, I have had to do) but I don't think we should say that particular use case is what bytes are targeted for. Anyone doing that should know what they are doing :-)
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/S3Q6NYRX... Code of Conduct: http://python.org/psf/codeofconduct/
On 9/9/21 9:37 AM, Barry Warsaw wrote:
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint().
So why don’t we just relax int.to_bytes()’s signature to include natural default values:
int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False)
Then I ought to be able to just do
>>> (65).to_bytes() b’A’
That seems so much worse than >>> bchr(65) b'A' ;-) -- ~Ethan~
On 9/9/2021 1:56 PM, Ethan Furman wrote:
On 9/9/21 9:37 AM, Barry Warsaw wrote:
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint().
So why don’t we just relax int.to_bytes()’s signature to include natural default values:
int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False)
Default arg values are one of Python's great features.
Then I ought to be able to just do
>>> (65).to_bytes() b’A’
That seems so much worse than
>>> bchr(65) b'A'
;-)
Except that .to_bytes already exists, and arguably should have had such defaults from the beginning, making any new function to do the same thing superfluous. -- Terry Jan Reedy
On 9/9/21 12:04 PM, Terry Reedy wrote:
Except that .to_bytes already exists, and arguably should have had such defaults from the beginning, making any new function to do the same thing superfluous.
New functions aren't always about new functionality; sometimes they are about increased usability. Everything in the PEP can already be accomplished, just not easily: bstr = b'hello world' bchr = bstr.getbyte(7) # proposed bchr = bstr[7:8] # existing for bchr in bstr.iterbytes(): # proposed ... for num in bstr: # existing bchr = bytes([bchr]) bchr = bytes.fromint(65) # proposed bchr = bytes([65]) # existing -- ~Ethan~
On Sep 9, 2021, at 10:56, Ethan Furman <ethan@stoneleaf.us> wrote:
On 9/9/21 9:37 AM, Barry Warsaw wrote:
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint().
So why don’t we just relax int.to_bytes()’s signature to include natural default values:
int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False)
Then I ought to be able to just do
>>> (65).to_bytes() b’A’
That seems so much worse than
bchr(65) b'A'
;-)
Maybe, but given that you can *already* do the equivalent of bchr() with:
(65).to_bytes(1, sys.byteorder) b'A'
it seems like a small stretch to make that more usable, and that would outweigh adding a difficult to understand new builtin. TOOWTDI. In case you really want bchr(): def bchr(x): return x.to_bytes(1, sys.byteorder)
bchr(65) b’A'
Cheers, -Barry
On 9/9/21 12:12 PM, Barry Warsaw wrote:
On Sep 9, 2021, at 10:56, Ethan Furman wrote:
On 9/9/21 9:37 AM, Barry Warsaw wrote:
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint().
So why don’t we just relax int.to_bytes()’s signature to include natural default values:
int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False)
Then I ought to be able to just do
>>> (65).to_bytes() b’A’
That seems so much worse than
>>> bchr(65) b'A'
;-)
Maybe, but given that you can *already* do the equivalent of bchr() with:
(65).to_bytes(1, sys.byteorder) b'A'
it seems like a small stretch to make that more usable, and that would outweigh adding a difficult to understand new builtin. TOOWTDI.
FWIW, bchr doesn't feel (much) like a new built-in, but more like a swap from unichr. At any rate, I know the built-in is not going to happen. int.to_bytes() doesn't feel like the One Obvious Way to me, and it certainly doesn't do much for readability in the bytearray case. Instead of `.fromord()` or `.fromint()` (or `int.to_bytes()`), my new favorite name, guaranteed not to change for at least the rest the day, is: bytes.chr() bytearray.chr() - this gets rid of the superflous b in bchr (not needed as the method is on bytes/bytearray) - has a nice symmetry with both the Python 3 chr(), and the answer to "where did the Python 2 chr() go?" question -- ~Ethan~
Adding default arguments to int.to_bytes() is both useful on its own merits and kind of too easy *not* to do, so... https://bugs.python.org/issue45155 https://github.com/python/cpython/pull/28265 -Barry
On Sep 9, 2021, at 12:12, Barry Warsaw <barry@python.org> wrote:
Signed PGP part On Sep 9, 2021, at 10:56, Ethan Furman <ethan@stoneleaf.us> wrote:
On 9/9/21 9:37 AM, Barry Warsaw wrote:
While I think int.to_bytes() is pretty obscure (I knew about it, forgot about it, and learned about it again!) I’m not so sure it’s any less obscure than a proposed bytes.fromint().
So why don’t we just relax int.to_bytes()’s signature to include natural default values:
int.to_bytes(length=1, byteorder=sys.byteorder, *, signed=False)
Then I ought to be able to just do
(65).to_bytes() b’A’
That seems so much worse than
bchr(65) b'A'
;-)
Maybe, but given that you can *already* do the equivalent of bchr() with:
(65).to_bytes(1, sys.byteorder) b'A'
it seems like a small stretch to make that more usable, and that would outweigh adding a difficult to understand new builtin. TOOWTDI.
In case you really want bchr():
def bchr(x): return x.to_bytes(1, sys.byteorder)
bchr(65) b’A'
Cheers, -Barry
Ethan Furman writes:
`int.from_bytes` methods, and is an appropriate name for the target domain (where bytes are treated as characters).
The relevant domains treat bytes as bytes. It's frequently useful (and dare I say "Pythonic"?) for *programmers* to take advantage of the mnemonic of treating 95 of the bytes as the ASCII encoding of characters. It follows that it's good sense for protocol designers to restrict themselves to that alphabet for their magic numbers. But standards themselves make clear that these protocols handle *octets* (historically, "byte" was too ambiguous for network protocols!), not characters or ints that happen to fit into 8 bits. The "control characters" aren't really characters even when they're syntactically significant, some of them are not significant unless combined in a particular order (CRLF), and of course the bytes 0x80-0xFF are never treated as characters. As far as I can see, the "programmers' mnemonic" interpretation gets us everything we really want in this area. We should avoid the idea that "bytes are treated as characters" because they aren't, and because that way lies the madness that incited the upheaval of Python 3 in the first place.
On 8 Sep 2021, at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?
I haven't been completely keeping up with the entire thread, so apologies if this has already been covered. I assume that the idea is that bytes.fromint should return a single byte, equivalent to chr() returning a single character.
To me, it sounds like should be the opposite of int.from_bytes.
int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world'
:>>> int.from_bytes(b'\x00\x00\x00\x01', byteorder='big') 1 :>>> bytes.from_int(1) would return b'\x01'? Without a length it cannot return b'\x00\x00\x00\x01' Barry
What am I missing? The integers between 0 and 255 map directly to a particular byte value. But any other integer could be expressed as a wide variety of multiple byte combinations. The proposal here covers byte-order, but what about 16 vs 32 vs 64 bits? Unsigned vs signed? I thought that’s what the struct module is for. There is the byte representation of Python’s bignum, but is that consistent across platforms and implementations? (Micropytjon, PyPy, IronPython, Jython) And even if so, is it useful? NOTE: my objection to “bchr”, whether as a builtin or not is not the functionality, it’s the name. Equating a byte with a character is a legacy of C ( and Python 2” — in Python 3, they are completely distinct concepts. Yes, that is serious bike-shedding :-) -CHB On Wed, Sep 8, 2021 at 10:16 AM Barry Scott <barry@barrys-emacs.org> wrote:
On 8 Sep 2021, at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Sep 07, 2021 at 08:09:33PM -0700, Barry Warsaw wrote:
I think Nick is on board with bytes.fromint() and no bchr(), and my sense of the sentiment here is that this would be an acceptable resolution for most folks. Ethan, can you reconsider?
I haven't been completely keeping up with the entire thread, so apologies if this has already been covered. I assume that the idea is that bytes.fromint should return a single byte, equivalent to chr() returning a single character.
To me, it sounds like should be the opposite of int.from_bytes.
int.from_bytes(b'Hello world', 'little')
121404708502361365413651784
bytes.from_int(121404708502361365413651784, 'little')
# should return b'Hello world'
:>>> int.from_bytes(b'\x00\x00\x00\x01', byteorder='big') 1 :>>> bytes.from_int(1) would return b'\x01'? Without a length it cannot return b'\x00\x00\x00\x01'
Barry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/TTFJ4VP5... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 9/8/21 1:21 PM, Christopher Barker wrote:
NOTE: my objection to “bchr”, whether as a builtin or not is not the functionality, it’s the name. Equating a byte with a character is a legacy of C ( and Python 2” — in Python 3, they are completely distinct concepts.
No, they aren't. If you are working in a domain that uses ascii encoding (such as many network protocols), then those bytes represent characters -- this is why, for example, %-interpolation was added back to bytes. -- ~Ethan~
On 08/09/2021 21:21, Christopher Barker wrote:
[snip] NOTE: my objection to “bchr”, whether as a builtin or not is not the functionality, it’s the name.
[snip] Why not byte() ?
I happened to need to convert an integer to a byte recently and I settled on bytes((i,)) I don't know if I missed a more elegant solution (suggestions welcome), but if I could write byte(i) that would feel more Pythonic to me. Best wishes Rob Cliffe
On 2021-09-09 00:29, Rob Cliffe via Python-Dev wrote:
On 08/09/2021 21:21, Christopher Barker wrote:
[snip] NOTE: my objection to “bchr”, whether as a builtin or not is not the functionality, it’s the name.
[snip] Why not byte() ?
I happened to need to convert an integer to a byte recently and I settled on bytes((i,)) I don't know if I missed a more elegant solution (suggestions welcome), but if I could write byte(i) that would feel more Pythonic to me.
Well, I tend to see a byte as a value like an int. If you slice a bytestring, you'd expect to get a bytestring, and you do. If you subscript a bytestring, you expect to get a byte. You get an int, and that suggests that a byte is an int. (In Python 2 you got a bytestring, in Python 3 you get an int.) The name could be misleading as byte(i) would return a bytestring, not a byte/int.
Steven D'Aprano wrote:
To me, it sounds like should be the opposite of int.from_bytes.
int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world' If that's not the API being suggested, that's going to be confusing.
I'm a bit lost here... why are we convinced at all that we need a new way to do this? Hasn't this functionality already existed for years?
x = int.from_bytes(b"*", "little") x 42 x.to_bytes(1, "little") b'*'
Brandt
Hum, it seems like this is a confusion between converting a whole bytes *string* to/from an integer, and converting a single *character* to/from an integer. I propose to rename PEP 467 method bytes.fromint(n) to => bytes.fromchar(n) <= to convert an integer to a single *character*: it fails if n is not in the [0; 255] range. "char" comes from "character", as "bchr()" means "bytes character". For C programmers, the usage of the "char" type is common for a single *character*. The char type is not treated as an integer, but part of a character string. All string functions take "char*" type (strcpy, printf, etc.). Converting an integer to a "char" in C: "int x = 1; char ch = (char)x;". I suggest to *not* add a builtin function bchr(), it's not common enough to justify to add it: it's trivial to create you own bchr() function: bchr = bytes.fromchar By the way, it's a little unfortunate that int methods have an underscore in their name (int.to_bytes, int.bit_length, int.as_integer_ratio), whereas bytes methods have no undersore in their name (bytes.removeprefix, bytes.islower). I guess that we should follow the trend of existing methods: so no underscore for bytes/bytearray methods. Victor On Wed, Sep 8, 2021 at 7:06 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
Steven D'Aprano wrote:
To me, it sounds like should be the opposite of int.from_bytes.
int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world' If that's not the API being suggested, that's going to be confusing.
I'm a bit lost here... why are we convinced at all that we need a new way to do this? Hasn't this functionality already existed for years?
x = int.from_bytes(b"*", "little") x 42 x.to_bytes(1, "little") b'*'
Brandt _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/FMG5K4BO... Code of Conduct: http://python.org/psf/codeofconduct/
-- Night gathers, and now my watch begins. It shall not end until my death.
On Thu, Sep 09, 2021 at 10:57:26AM +0200, Victor Stinner wrote:
I propose to rename PEP 467 method bytes.fromint(n) to => bytes.fromchar(n) <= to convert an integer to a single *character*: it fails if n is not in the [0; 255] range. "char" comes from "character", as "bchr()" means "bytes character".
Integers 0...255 are not characters. They are ints. `bytes.fromchar` would have to accept a string of length 1, as in: bytes.fromchar('a') # returns b'a' otherwise the name is completely inaccurate.
For C programmers,
We're Python programmers. To Python programmers, the int 20 is not a space character.
I suggest to *not* add a builtin function bchr(), it's not common enough to justify to add it
Agreed, having a builtin bchr() function doesn't seem to be justified. We can always add it in the future if needed, but using a bytes method should be fine. -- Steve
On Wed, Sep 08, 2021 at 05:06:08PM -0000, Brandt Bucher wrote:
Steven D'Aprano wrote:
To me, it sounds like should be the opposite of int.from_bytes.
int.from_bytes(b'Hello world', 'little') 121404708502361365413651784 >>> bytes.from_int(121404708502361365413651784, 'little') # should return b'Hello world' If that's not the API being suggested, that's going to be confusing.
I'm a bit lost here... why are we convinced at all that we need a new way to do this? Hasn't this functionality already existed for years?
x = int.from_bytes(b"*", "little") x 42 x.to_bytes(1, "little") b'*'
TIL :-) How have I never noticed to_bytes until now? o_O -- Steve
Steven D'Aprano wrote:
TIL :-) How have I never noticed to_bytes until now? o_O
I’m going to go out on a limb here: because it’s rarely ever needed? I mean, the proposed bchr() functionality is crazy simple to implement yourself if you actually *do* need it. You can even get creative and use the dedicated “pistol” operator:
b = b"*" i ,= b i 42
;) Brandt
participants (21)
-
Antoine Pitrou
-
Arnaud Delobelle
-
Barry Scott
-
Barry Warsaw
-
Brandt Bucher
-
Brett Cannon
-
Chris Angelico
-
Christopher Barker
-
Ethan Furman
-
Gregory P. Smith
-
Guido van Rossum
-
Luciano Ramalho
-
MRAB
-
Nick Coghlan
-
raymond.hettinger@gmail.com
-
Rob Cliffe
-
Stephen J. Turnbull
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Victor Stinner