On 10 September 2014 08:04, Chris Lasher chris.lasher@gmail.com wrote:
Why did the CPython core developers decide to force the display of ASCII characters in the printable representation of bytes objects in CPython 3?
I'd argue this is symptomatic of something that got mentioned in the lengthy discussions around PEP 461: namely, that Python's bytestrings are really still very stringy. For example, they retain their 'upper' method, which is so totally bizarre in the context of bytes that it causes me to mentally segfault every time I see it:
a = b'hi there' a.upper()
b'HI THERE'
As Nick mentioned, this is fundamentally because of protocols like HTTP/1.1, which are a weird hybrid of text-based and binary that is only simple if you assume ASCII everywhere. (Of course, HTTP does not assume ASCII everywhere, but that's because it's wildly underspecified).
I doubt you'll get far with this proposal on this list, which is a shame because I think you have a point. There is an impedance mismatch between the Python community saying "Bytes are not text" and the fact that, wow, they really do look like they are sometimes!
For what it's worth, Nick has made this comment:
Primarily because it's incredibly useful for debugging ASCII based binary formats (which covers many network protocols and file formats).
This is true, but it goes both ways: it makes it a lot *harder* to debug pure-binary network formats (like HTTP/2). I basically have to have an ASCII codepage in front of me to debug using the printed representation of a bytestring because I keep getting characters thrown into my nice hex output. Sadly, you can't please everyone.