On Wed, Sep 10, 2014 at 12:27 PM, Andrew Barnert abarnert@yahoo.com wrote:
It strikes me that we should have both asciify and hexlify (or whatever we call them) so people can be explicit when debugging; the question then becomes which one repr calls.
Well said, and I agree both methods should be added. Explicit is better than implicit," here, to me, trumps, "There should be one and only one obvious way to do it." Using these methods should be preferred when one needs to actually store the results.
repr() is, to me, meant as a convenience function for the programmer to inspect her data structure, and is not meant to be relied upon as a shortcut to string representation in production code. But perhaps others here disagree and think repr() can and should be used in production code.
The argument in favor of "asciify" is that the hex representation is more purist.
The argument in favor of "hexlify" is that it makes Python 3.6 do the same thing as 3.0-3.5, and in fact 1.0-2.7 as well; people have had a few decades to get used to being lazy with mostly-ASCII protocols, while people have had a few decades to get used to being explicit with pure-binary protocols.
Again, very well said!
But maybe there's another potential concern that can help decide. A lot of novices using bytes get confused when they see b'\x05Hello'
I guess I wasn't clear: this is precisely why I've raised this issue. I promise I'm not trying to make life harder for folks using Python 3 to work with HTTP/1.1! I'm trying to lower the barrier of comprehension to those who have not used Python 3, and especially those who have never programmed before in their life. I have teach these people, in my local Python meetup group, in Software Carpentry courses, and one-on-one with junior developers in my company.
Put yourself in the shoes of a beginner.
If Python does this
>>> bytes(range(15)) b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e'
To understand this, you have to learn just two things:
1. Bytes is a sequence of integers between the range of 0 and 255. 2. How to translate base-10 integers into hexadecimal.
You have to see this through the eyes of a beginner to see this
>>> bytes(range(15)) b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e'
Now you have four things to explain!
1. Bytes is a sequence of integers between the range of 0 and 255. 2. How to translate base-10 integers into hexadecimal. 3. How ASCII provides a mapping between some integers and English characters 4. The conditions under which you'll see an ASCII character in place of a hexadecimal value versus the hexadecimal value itself
It's easier to teach a student how to decode bytes into ASCII characters when the student can see the bytes, then the resulting ASCII characters in the string, in a one-to-one fashion. It is deeply confusing when they inspect the bytes in the REPL and already see the ASCII characters. The natural question is, "But I already see the character, so why do I have to decode it?!"
The current behavior of repr() on bytes puts an unfair cognitive burden on novices (and those of us working with "pure binary" files) compared to the gains to advanced programmers who already can comprehend the mapping of bytes to ASCII characters and can manage the mixture of the two.
Think of the children! :-)