On Tue, Aug 10, 2021 at 3:48 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Aug 10, 2021 at 3:00 PM <raymond.hettinger@gmail.com> wrote:
The history of bytes/bytearray is a dual-purpose view.  It can be used in a string-like way to emulate Python 2 string handling (hence all the usual string methods and a repr that displays in a string-like fashion).  It can also be used as an array of numbers, 0 to 255 (hence the list methods and having an iterator of ints).  ISTM that the authors of this PEP reject or want to discourage the latter use cases. 

I didn't read it that way, but if so, please no, I"d rather see the former use cases discouraged. ISTM that the Py2 string handling is still needed for working with mixed binary / text data -- but that should be a pretty specialized use case. spelling the way to create a byte, byte() sure makes more sense in any other context.
... anything where a C programmer would an array of unsigned chars).

or any programmer would use an array of unsigned 8bit integers :-) numpy spells it: `np.uint8`, and the the type in the C99 stdint.h is `uint8_t`. My point is that for anyone not an "old time" C programmer, or even a Python2 programmer, the  "character is an unsigned 8 bit int" concept is alien and confusing, not a helpful mnemonic.
For example, creating a single byte with bytes([0x1f]) isn't pleasant, obvious, or fast.

no, though bytes([31]) isn't horrible ;-)   (despite coding for over four decades, I'm still not comfortable with hex notation)

I say it's not horrible, because bytes is a Sequence of bytes (or integer values between 0 and 255), initializing it with an iterable seems pretty reasonable, that's how we initialize most (all?) other sequences after all. And compatible with array.array and numpy arrays.

I consider bytes([31]) notation to be horrible API design because a simple easy to make typo of omitting the [] or using () and forgetting the tupleizing comma turns it into a different valid call with an entirely different meaning.  bytes([31]) vs bytes((31)) vs bytes(31).

It's also ugly to anyone who thinks about what bytecode is generated and executed in order to do it.  an entire new list object with a single element referring to a tiny int is created and destroyed just to create a b'\037' object?  An optimizer pass to fix that up at the bytecode level isn't easy as it can only be done when it can prove that `bytes` has not been reassigned to something other than the builtin.  Near impossible in a lot of code.  bytes.fromint(31) isn't much better in the bytecode regard, but at least a temporary list is not being created.

As much as I think that bytes(size: int) was a bad idea to have as an API - bytearray(size: int) is fine and useful as it is mutable - that ship sailed and getting rid of it would break some odd code.  It doesn't have much use, so adding fromsize(size: int) methods don't sound very compelling as it just adds yet another way to do the same thing.  we should just live with that specific wart.

`bchr` as a builtin... I'm with the others on saying no to any new builtin that isn't expected to see frequent use.  bchr won't see frequent use.

`bytes.fromint` seems fine.  others are proposing `bytes.byte` for that.  I don't like to argue over names (the last stage of anything) but I do need to point out how that sounds to read.  It falls victim to API stuttering.  "bytes dot byte" or "bytes byte" doesn't convey much to a reader in English as the difference is a subtle "s".  "bytes dot from int" or "bytes from int" is quite clear.  (avoiding stuttering in API design was popularized by golang - it's a good thing to strive for in any language)  It's times like this that i wish Python had chosen consistent camelCase, CapWords, or snake_case in all API names as conjoinedwords aren't great. But they are sadly consistent with our past sins.

One thing never mentioned in the PEP.  If you expect a primary use of the fromint (aka bchr builtin that isn't going to happen) to be called on constant values often.  Why are we adding name lookups and function calls to this?  Why not address the elephant in the room and allow for decimal values to be written as an escape sequence within bytes literals?

b'\d31' for example to say "decimal byte 31".  Proposal: Only values 0-255 with no leading zero should be accepted when parsing such an escape.  (Do not bother adding the same feature for codepoints in unicode strs; leave that to later if someone shows actual demand).  This can't address the bytearray need, but that's been true of bytearray for ages, a common way to create them is via a copy from transient bytes objects.  bytearray(b'\d31') isn't much different than bytearray.fromint(31).  one less name lookup.

Why not add a \d escape? Introducing a new escape is fraught with peril as existing \d's within b'' literals in code could change meaning.  backwards compatibility fail.  But one that is easy to check for with a DeprecationWarning for a few releases...  The new literal parsing could be enabled per-file with a __future__ import.



Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RM4JHK4GIKYYWV7J5F6IQJ66KUIXWMMF/
Code of Conduct: http://python.org/psf/codeofconduct/