[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Nick Coghlan ncoghlan at gmail.com
Thu Sep 11 04:35:29 CEST 2014


On 11 September 2014 11:57, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Folks should keep in mind that when we talk about "hybrid ASCII binary
> data", we're not just talking about things like SMTP and HTTP 1.1 and
> debugging network protocol traffic, we're also talking about things
> like URLs, filesystem paths, email addresses, environment variables,
> command line arguments, process names, passing UTF-8 encoded data to
> GUI frameworks, etc that are often both ASCII compatible and human
> readable *by design*.
>
> Note the error message produced here with my modified build:
>
> $ ./python -c 'import os; print(os.listdir(b"foo"))'
> Traceback (most recent call last):
>   File "<string>", line 1, in <module>
> FileNotFoundError: [Errno 2] No such file or directory: b'Zfoo'
>
> And this directory listing:
>
> $ ./python -c 'import os; print(os.listdir(b"Mac"))'
> [b'ZIDLE', b'ZMakefile.in', b'ZTools', b'ZREADME.orig',
> b'ZPythonLauncher', b'ZIcons', b'ZREADME', b'ZExtras.install.py',
> b'ZBuildScript', b'ZResources']

After posting that version, I realised actually making the proposed
change would be similarly straightforward, and better illustrate the
core problem with the idea:

$ ./python -c 'import os; print(os.listdir(b"foo"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: b'\x66\x6f\x6f'
$ ./python -c 'import os; print(os.listdir(b"Mac"))'
[b'\x49\x44\x4c\x45', b'\x4d\x61\x6b\x65\x66\x69\x6c\x65\x2e\x69\x6e',
b'\x54\x6f\x6f\x6c\x73',
b'\x52\x45\x41\x44\x4d\x45\x2e\x6f\x72\x69\x67',
b'\x50\x79\x74\x68\x6f\x6e\x4c\x61\x75\x6e\x63\x68\x65\x72',
b'\x49\x63\x6f\x6e\x73', b'\x52\x45\x41\x44\x4d\x45',
b'\x45\x78\x74\x72\x61\x73\x2e\x69\x6e\x73\x74\x61\x6c\x6c\x2e\x70\x79',
b'\x42\x75\x69\x6c\x64\x53\x63\x72\x69\x70\x74',
b'\x52\x65\x73\x6f\x75\x72\x63\x65\x73']

vs

$ python3 -c 'import os; print(os.listdir(b"foo"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: 'foo'
$ python3 -c 'import os; print(os.listdir(b"Mac"))'
[b'IDLE', b'Makefile.in', b'Tools', b'README.orig', b'PythonLauncher',
b'Icons', b'README', b'Extras.install.py', b'BuildScript',
b'Resources']

It's more than just a matter of backwards compatibility, it's a matter
of asymmetry of impact when the two possible design choices are wrong:

* Using a hex based repr when an ASCII based repr is more appropriate
is utterly unreadable
* Using an ASCII based repr when a hex based repr is more appropriate
is somewhat confusing

This kind of thing is why the original "binary representation by
default" design didn't survive the Python 3.0 development cycle - once
people started trying it out, it quickly became evident that it was
the wrong approach to take (if I remember the original implementation
correctly, the repr was along the lines of "bytes([1, 2, 3, 4])" since
there wasn't a bytes literal until after PEP 3137 was implemented).

Making hex representations of binary data easier to produce is still a
good idea, though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list