[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Nick Coghlan ncoghlan at gmail.com
Thu Sep 11 03:57:35 CEST 2014


On 11 September 2014 10:42, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Sep 11, 2014 at 4:35 AM, Chris Lasher <chris.lasher at gmail.com> wrote:
>> Unless printable representation of bytes objects appears as part of
>> the language specification for Python 3, it's an implementation
>> detail, thus, it is a candidate for change, especially if the BDFL
>> wills it so.
>
> So this is all about the output of repr(), right? The question then
> is: How important is backward compatibility with repr? Will there be
> code breakage?

I changed PyBytes_Repr to inject a 'Z' after the opening quote to see
just how extensive the damage would be in CPython's own regression
test suite (as I belatedly realised the magnitude of the impact may
not be obvious to everyone, so I figured it was worth quantifying):

355 tests OK.
17 tests failed:
    test_base64 test_bytes test_configparser test_ctypes test_doctest
    test_file_eintr test_hash test_io test_pdb test_pickle
    test_pickletools test_re test_smtpd test_subprocess test_sys
    test_telnetlib test_tools
1 test altered the execution environment:
    test_warnings
17 tests skipped:
    test_curses test_devpoll test_kqueue test_msilib test_ossaudiodev
    test_smtpnet test_socketserver test_startfile test_timeout test_tk
    test_ttk_guionly test_urllib2net test_urllibnet test_winreg
    test_winsound test_xmlrpc_net test_zipfile64

I ran those tests without enabling *any* of the optional resources
(and the Windows specific tests won't run on my machine).

Folks should keep in mind that when we talk about "hybrid ASCII binary
data", we're not just talking about things like SMTP and HTTP 1.1 and
debugging network protocol traffic, we're also talking about things
like URLs, filesystem paths, email addresses, environment variables,
command line arguments, process names, passing UTF-8 encoded data to
GUI frameworks, etc that are often both ASCII compatible and human
readable *by design*.

Note the error message produced here with my modified build:

$ ./python -c 'import os; print(os.listdir(b"foo"))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: b'Zfoo'

And this directory listing:

$ ./python -c 'import os; print(os.listdir(b"Mac"))'
[b'ZIDLE', b'ZMakefile.in', b'ZTools', b'ZREADME.orig',
b'ZPythonLauncher', b'ZIcons', b'ZREADME', b'ZExtras.install.py',
b'ZBuildScript', b'ZResources']

Python 3 carved out a whole lot of text processing operations and said
"these are clearly and unambiguous working with text data, we
shouldn't confuse them with binary data manipulation". The remaining
ambiguity in the behaviour of the Python 3 bytes type is largely
inherent in the way computers currently work - there's no getting away
from it.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list