[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Nick Coghlan ncoghlan at gmail.com
Wed Sep 17 12:57:29 CEST 2014

On 17 September 2014 06:09, Andrew Barnert
<abarnert at yahoo.com.dmarc.invalid> wrote:
> No, you're mixing up `format`, an explicit method on str that no one is suggesting adding to bytes, and `__format__`, a dunder method on every type that's used by `str.format` and `format`; the proposal is to extend `bytes.__format__` in some way that I don't think is entirely decided yet, but it would look something like this:
>     u'Hello, {:a}'.format(some_bytes_var)  --> u'Hello, <whatever>'
> Or:
>     u'Hello, {:#x}'.format(some_bytes_var) --> u'Hello, \\x2d\\x78\\x68\\x61...'

Ignoring the specifics of the minilanguage, here are the examples I
posted to http://bugs.python.org/issue22385:

    format(b"xyz", "x") -> '78797a'
    format(b"xyz", "X") -> '78797A'
    format(b"xyz", "#x") -> '0x78797a'

    format(b"xyz", ".1x") -> '78 79 7a'
    format(b"abcdwxyz", ".4x") -> '61626364 7778797a'
    format(b"abcdwxyz", "#.4x") -> '0x61626364 0x7778797a'

    format(b"xyz", ",.1x") -> '78,79,7a'
    format(b"abcdwxyz", ",.4x") -> '61626364,7778797a'
    format(b"abcdwxyz", "#,.4x") -> '0x61626364,0x7778797a'

The point on the issue tracker was that while this is a good way to
obtain the flexibility, adhering too closely to the "standard format
syntax" as I did likely isn't a good idea. Instead, we'd be better
going for the strftime model where a type specific format (e.g. as an
argument to the new *.hex() methods being discussed in
http://bugs.python.org/issue) is *also* supported via __format__.

For example, inspired directly by the way hex editors work, you could
envision an approach where you had a base format character (chosen to
be orthogonal to the default format characters):

    "h": lowercase hex
    "H": uppercase hex
    "A": ASCII (using "." for unprintable & extended ASCII)

    format(b"xyz", "A") -> 'xyz'
    format(b"xyz", "h") -> '78797a'
    format(b"xyz", "H") -> '78797A'

Followed by a separator and "chunk size":

    format(b"xyz", "h 1") -> '78 79 7a'
    format(b"abcdwxyz", "h 4") -> '61626364 7778797a'

    format(b"xyz", "h,1") -> '78,79,7a'
    format(b"abcdwxyz", "h,4") -> '61626364,7778797a'

    format(b"xyz", "h:1") -> '78:79:7a'
    format(b"abcdwxyz", "h:4") -> '61626364:7778797a'

In the "h" and "H" cases, you could request a preceding "0x" on the chunks:

    format(b"xyz", "h#") -> '0x78797a'
    format(b"xyz", "h# 1") -> '0x78 0x79 0x7a'
    format(b"abcdwxyz", "h# 4") -> '0x61626364 0x7778797a'

The section before the format character would use the standard string
formatting rules: alignment, fill character, width, precision


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-ideas mailing list