On 11 September 2014 11:57, Nick Coghlan
Folks should keep in mind that when we talk about "hybrid ASCII binary data", we're not just talking about things like SMTP and HTTP 1.1 and debugging network protocol traffic, we're also talking about things like URLs, filesystem paths, email addresses, environment variables, command line arguments, process names, passing UTF-8 encoded data to GUI frameworks, etc that are often both ASCII compatible and human readable *by design*.
Note the error message produced here with my modified build:
$ ./python -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "<string>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: b'Zfoo'
And this directory listing:
$ ./python -c 'import os; print(os.listdir(b"Mac"))' [b'ZIDLE', b'ZMakefile.in', b'ZTools', b'ZREADME.orig', b'ZPythonLauncher', b'ZIcons', b'ZREADME', b'ZExtras.install.py', b'ZBuildScript', b'ZResources']
After posting that version, I realised actually making the proposed change would be similarly straightforward, and better illustrate the core problem with the idea: $ ./python -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "<string>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: b'\x66\x6f\x6f' $ ./python -c 'import os; print(os.listdir(b"Mac"))' [b'\x49\x44\x4c\x45', b'\x4d\x61\x6b\x65\x66\x69\x6c\x65\x2e\x69\x6e', b'\x54\x6f\x6f\x6c\x73', b'\x52\x45\x41\x44\x4d\x45\x2e\x6f\x72\x69\x67', b'\x50\x79\x74\x68\x6f\x6e\x4c\x61\x75\x6e\x63\x68\x65\x72', b'\x49\x63\x6f\x6e\x73', b'\x52\x45\x41\x44\x4d\x45', b'\x45\x78\x74\x72\x61\x73\x2e\x69\x6e\x73\x74\x61\x6c\x6c\x2e\x70\x79', b'\x42\x75\x69\x6c\x64\x53\x63\x72\x69\x70\x74', b'\x52\x65\x73\x6f\x75\x72\x63\x65\x73'] vs $ python3 -c 'import os; print(os.listdir(b"foo"))' Traceback (most recent call last): File "<string>", line 1, in <module> FileNotFoundError: [Errno 2] No such file or directory: 'foo' $ python3 -c 'import os; print(os.listdir(b"Mac"))' [b'IDLE', b'Makefile.in', b'Tools', b'README.orig', b'PythonLauncher', b'Icons', b'README', b'Extras.install.py', b'BuildScript', b'Resources'] It's more than just a matter of backwards compatibility, it's a matter of asymmetry of impact when the two possible design choices are wrong: * Using a hex based repr when an ASCII based repr is more appropriate is utterly unreadable * Using an ASCII based repr when a hex based repr is more appropriate is somewhat confusing This kind of thing is why the original "binary representation by default" design didn't survive the Python 3.0 development cycle - once people started trying it out, it quickly became evident that it was the wrong approach to take (if I remember the original implementation correctly, the repr was along the lines of "bytes([1, 2, 3, 4])" since there wasn't a bytes literal until after PEP 3137 was implemented). Making hex representations of binary data easier to produce is still a good idea, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia