On Wed, Sep 10, 2014 at 3:09 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

In Python 3, "bytes" is still a hybrid type that can hold:

* arbitrary binary data
* binary data that contains ASCII segments

Let me be clear. Here are things this proposal does NOT include:

* Removing string-like methods from bytes
* Removing ASCII from bytes literals

Those have proven incredibly useful to the Python community. I appreciate that. This proposal does not take these behaviors away from bytes.

Here's what my proposal DOES include:

1. Adjust the behavior of repr() on a bytes instance such that only hexadecimal codes appear. The returned value would be the text displaying the bytes literal of hexadecimal codes that would reproduce the bytes instance.

2. Provide a method (suggested: "bytes.asciify") that returns a printable representation of bytes that replaces bytes whose values map to printable ASCII glyphs with the glyphs. The returned value would be the text displaying the bytes literal of ASCII glyphs and hexadecimal codes that would reproduce the bytes instance. If you liked the behavior of repr() on bytes in Python 3.0 through 3.4 (or 3.5), it's still available via this method call!

3. Optionally, provide a method (suggested: "bytes.hexlify") which implements the code for creating the printable representation of the bytes with hexadecimal values only, and call this method in bytes.__repr__.

Both the default repr and the literal form assume the "binary data ASCII compatible segments", which aligns with the behaviour of the Python 2 str type. That isn't going to change in Python, especially since we actually *did* try it for a while (prior to the 3.0 release) and really didn't like it.

Yes, more specifically you said:
Early (pre-release) versions of Python 3.0 didn't have this behaviour, and getting the raw integer dumps instead turned out to be *really* annoying in practice, so we decided the easier debugging justified the increased risk of creating incorrect mental models for users (especially those migrating from Python 2).

What you haven't said so far, however, and what I still don't know, is whether or not the core team has already tried providing a method on bytes objects à la the proposed .asciify() for projecting bytes as ASCII characters, and rejected that on the basis of it being too inconvenient for the vast majority of Python use cases.

Did the core team try this, before deciding that this should be the result from repr() should automatically rewrite printable ASCII characters in place of hex values for bytes?

So far, I've heard a lot of requests to keep the behavior because it's convenient. But how inconvenient is it to call bytes.asciify()? Are those not in favor of changing the behavior of repr() really going to sit behind the argument that the effort expended in typing ten more characters ought to guarantee that thousands of other programmers are going to have to figure out why there's letters in their bytes – or rather, how there's actually NOT letters in their bytes?

And once again, we are talking about changing behavior that is unspecified by the Python 3 language specification. The language is gaining a reputation for confusing the two, however, as written by Armin Ronacher [1]:

Python is definitely a language that is not perfect. However I think what frustrates me about the language are largely problems that have to do with tiny details in the interpreter and less the language itself. These interpreter details however are becoming part of the language and this is why they are important.

I feel passionately this implicit ASCII-translation behavior should not propagate into further releases CPython 3, and I don't want to see it become a de facto specification due to calcification. We're talking about the next 10 to 15 years. Nobody guaranteed the behavior of repr() so far. With the bytes.asciify() method (or whatever it may be called), we have a fair compromise, plus a more explicit specification of behavior of bytes in Python 3.

In closing on this message, I want to say that I appreciate you hearing me out, Nick. I have appreciated your answers, and certainly the historical background. And thanks to the others who have contributed here. I appreciate you taking the time to discuss this.

Chris L.

  [1] http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/