I originally wrote this late last night but realized today that I only sent this reply to Terry Reedy, not to python-ideas. (Apologies, Terry – I didn't mean to single you out with my rant!)
I'm reposting it in full, below. Some of these ideas have already been raised by others and counter-arguments already posed. I still feel I have not seen some of these points directly addressed, namely, the unreasonableness of seeing bytes from floating point numbers as ASCII characters, and the sanity of the API I counter-propose.
Message now appears below:
On Wed, Sep 10, 2014 at 1:11 AM, Terry Reedy firstname.lastname@example.org wrote:
I agree with Chris Lasher's basic point, that the representation of bytes confusingly contradicts the idea that bytes are bytes. But it is not going to change.
Unless printable representation of bytes objects appears as part of the language specification for Python 3, it's an implementation detail, thus, it is a candidate for change, especially if the BDFL wills it so. Consider me optimistic that we can change it, or I would have just posted yet another "Python 3 gets it all wrong" blog post to the web instead of writing this pre-proposal. :-)
On 9/10/2014 3:56 AM, Cory Benfield wrote:
On 10 September 2014 08:45, Nick Coghlan email@example.com wrote:
memoryview.cast can be a potentially useful tool for that :)
Sure, and so can binascii.hexlify (which is what I normally use).
See http://bugs.python.org/issue9951 to add bytes.hex or .tohex as more of less the inverse of bytes.fromhex or even have hex(bytes) work. This change *is* possible and I think we should pick one of the suggestions for 3.5.
Here's the API Issue 9951 is proposing:
>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' b'Hello, World!' >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.tohex() b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'Hello, World!' b'Hello, World!' >>> b'Hello, World!'.tohex() b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'
I'll tell you what: here's the API of my counter-proposal:
>>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21'.asciify() b'Hello, World!' >>> b'Hello, World!' b'\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21' >>> b'Hello, World!'.asciify() b'Hello, World!'
Here's the prose description of my counter-proposal: add a method to the bytes object called `.asciify`, that returns a printable representation of the bytes, where bytes mapping to printable ASCII characters are displayed as ASCII characters, and the remainder are given as hex codes. That is, .asciify() should round-trip a bytes literal. This frees up repr() to do what universally makes sense on a series of bytes: state the bytes!
Marc-Andre Lemburg said:
A definite -1 from me on making repr(b"Hello World") harder to read than necessary.
Okay, but a definite -1e6 from me on making my Python interpreter do this:
>>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12, 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955) >>> my_packed_bytes b'Why, Guido? Why?'
I do understand the utility of peering in to ASCII text, but like Cory Benfield stated earlier:
I'm saying that I don't get to do debugging with a simple print statement when using the bytes type to do actual binary work, while those who are doing sort-of binary work do.
Does the inconvenience of having to explicitly call the .asciify() method on a bytes object justify the current behavior for repr() on a bytes object? The privilege of being lazy is obstructing the right to see what we've actually got in the bytes object, and is jeopardizing the very argument that "bytes are not strings".
On Wed, Sep 10, 2014 at 10:51 AM, Cory Benfield firstname.lastname@example.org wrote:
On 10 September 2014 17:59, Stephen J. Turnbull email@example.com wrote:
So does 0xDEADBEEF, but actually that's *not* text, it's a 32-bit pointer, conveniently invalid on most 32-bit architectures and very obvious when it shows up in a backtrace. Do you see an impedence mismatch in the C community because of that?
In fact, *all* bytes "look like text", because *you can't see them until they're converted to text by repr()*! This is the key to the putative "impedence mismatch" -- it's perceived as such when people don't distinguish the map from the territory.
I apologise, I was insufficiently clear. I mean that interaction with the bytes type in Python has a lot of textual aspects to it. This is a *deliberate* decision (or at least the documentation makes it seem deliberate), and I can understand the rationale, but it's hard to be surprised that it leads developers astray.
Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer, it's a 32-bit something. Its type is undefined in that expression. It has a standard usage as a guard word, but still, let's not jump to conclusions here!
I accept your core point, however, which I consider to be this:
The issue that sometimes it's easier to read hex than ASCII mixed with other stuff (hex escapes or Latin-1) is true enough, though. But it's not about an impedence mismatch, it's a question of what does *this* developer consider to be the convenient repr for *that* task.
This is definitely true, which I believe I've already admitted in this thread. I do happen to believe that having it be hex would provide a better pedagogical position ("you know this isn't text because it looks like gibberish!"), but that ship sailed a long time ago. _______________________________________________ Python-ideas mailing list Pythonfirstname.lastname@example.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/