[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Thu Sep 11 01:23:58 CEST 2014

On Wed, Sep 10, 2014 at 3:09 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> In Python 3, "bytes" is still a hybrid type that can hold:
>
> * arbitrary binary data
> * binary data that contains ASCII segments
>
Let me be clear. Here are things this proposal does NOT include:

* Removing string-like methods from bytes
* Removing ASCII from bytes literals

Those have proven incredibly useful to the Python community. I appreciate
that. This proposal does not take these behaviors away from bytes.

Here's what my proposal DOES include:

1. Adjust the behavior of repr() on a bytes instance such that only
hexadecimal codes appear. The returned value would be the text displaying
the bytes literal of hexadecimal codes that would reproduce the bytes
instance.

2. Provide a method (suggested: "bytes.asciify") that returns a printable
representation of bytes that replaces bytes whose values map to printable
ASCII glyphs with the glyphs. The returned value would be the text
displaying the bytes literal of ASCII glyphs and hexadecimal codes that
would reproduce the bytes instance. If you liked the behavior of repr() on
bytes in Python 3.0 through 3.4 (or 3.5), it's still available via this
method call!

3. Optionally, provide a method (suggested: "bytes.hexlify") which
implements the code for creating the printable representation of the bytes
with hexadecimal values only, and call this method in bytes.__repr__.

> Both the default repr and the literal form assume the "binary data ASCII
> compatible segments", which aligns with the behaviour of the Python 2 str
> type. That isn't going to change in Python, especially since we actually
> *did* try it for a while (prior to the 3.0 release) and really didn't like
> it.
>
Yes, more specifically you said:

> Early (pre-release) versions of Python 3.0 didn't have this behaviour, and
> getting the raw integer dumps instead turned out to be *really* annoying
> in practice, so we decided the easier debugging justified the increased
> risk of creating incorrect mental models for users (especially those
> migrating from Python 2).

What you haven't said so far, however, and what I still don't know, is
whether or not the core team has already tried providing a method on bytes
objects à la the proposed .asciify() for projecting bytes as ASCII
characters, and rejected that on the basis of it being too inconvenient for
the vast majority of Python use cases.

Did the core team try this, before deciding that this should be the result
from repr() should automatically rewrite printable ASCII characters in
place of hex values for bytes?

So far, I've heard a lot of requests to keep the behavior because it's
convenient. But how inconvenient is it to call bytes.asciify()? Are those
not in favor of changing the behavior of repr() really going to sit behind
the argument that the effort expended in typing ten more characters ought
to guarantee that thousands of other programmers are going to have to
figure out why there's letters in their bytes – or rather, how there's
actually NOT letters in their bytes?

And once again, we are talking about changing behavior that is unspecified
by the Python 3 language specification. The language is gaining a
reputation for confusing the two, however, as written by Armin Ronacher [1]:

Python is definitely a language that is not perfect. However I think what
> frustrates me about the language are largely problems that have to do with
> tiny details in the interpreter and less the language itself. These
> interpreter details however are becoming part of the language and this is
> why they are important.

I feel passionately this implicit ASCII-translation behavior should not
propagate into further releases CPython 3, and I don't want to see it
become a de facto specification due to calcification. We're talking about
the next 10 to 15 years. Nobody guaranteed the behavior of repr() so far.
With the bytes.asciify() method (or whatever it may be called), we have a
fair compromise, plus a more explicit specification of behavior of bytes in
Python 3.

In closing on this message, I want to say that I appreciate you hearing me
out, Nick. I have appreciated your answers, and certainly the historical
background. And thanks to the others who have contributed here. I
appreciate you taking the time to discuss this.

Chris L.

  [1] http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140910/f5f595e9/attachment.html>