[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Wed Sep 10 22:29:13 CEST 2014

On Wed, Sep 10, 2014 at 12:27 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
>
> It strikes me that we should have both asciify and hexlify (or whatever we call them) so people can be explicit when debugging; the question then becomes which one repr calls.

Well said, and I agree both methods should be added. Explicit is
better than implicit," here, to me, trumps, "There should be one and
only one obvious way to do it." Using these methods should be
preferred when one needs to actually store the results.

repr() is, to me, meant as a convenience function for the programmer
to inspect her data structure, and is not meant to be relied upon as a
shortcut to string representation in production code. But perhaps
others here disagree and think repr() can and should be used in
production code.

>
> The argument in favor of "asciify" is that the hex representation is more purist.
>
> The argument in favor of "hexlify" is that it makes Python 3.6 do the same thing as 3.0-3.5, and in fact 1.0-2.7 as well; people have had a few decades to get used to being lazy with mostly-ASCII protocols, while people have had a few decades to get used to being explicit with pure-binary protocols.

Again, very well said!

> But maybe there's another potential concern that can help decide. A lot of novices using bytes get confused when they see b'\x05Hello'

I guess I wasn't clear: this is precisely why I've raised this issue.
I promise I'm not trying to make life harder for folks using Python 3
to work with HTTP/1.1! I'm trying to lower the barrier of
comprehension to those who have not used Python 3, and especially
those who have never programmed before in their life. I have teach
these people, in my local Python meetup group, in Software Carpentry
courses, and one-on-one with junior developers in my company.

Put yourself in the shoes of a beginner.

If Python does this

    >>> bytes(range(15))
    b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e'

To understand this, you have to learn just two things:

1. Bytes is a sequence of integers between the range of 0 and 255.
2. How to translate base-10 integers into hexadecimal.

You have to see this through the eyes of a beginner to see this

    >>> bytes(range(15))
    b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e'

Now you have four things to explain!

1. Bytes is a sequence of integers between the range of 0 and 255.
2. How to translate base-10 integers into hexadecimal.
3. How ASCII provides a mapping between some integers and English characters
4. The conditions under which you'll see an ASCII character in place
of a hexadecimal value versus the hexadecimal value itself

It's easier to teach a student how to decode bytes into ASCII
characters when the student can see the bytes, then the resulting
ASCII characters in the string, in a one-to-one fashion. It is deeply
confusing when they inspect the bytes in the REPL and already see the
ASCII characters. The natural question is, "But I already see the
character, so why do I have to decode it?!"

The current behavior of repr() on bytes puts an unfair cognitive
burden on novices (and those of us working with "pure binary" files)
compared to the gains to advanced programmers who already can
comprehend the mapping of bytes to ASCII characters and can manage the
mixture of the two.

Think of the children! :-)