[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Steven D'Aprano steve at pearwood.info
Thu Sep 11 09:30:46 CEST 2014


On Wed, Sep 10, 2014 at 03:37:03PM +0100, Paul Moore wrote:
> On 10 September 2014 15:24, Ian Cordasco <graffatcolmingov at gmail.com> wrote:
> >>> b'Abc'.decode('hexescapes') --> '\x41\x62\x63'
> >>
> >>
> >> This, OTOH, looks elegant (avoids a new method) and clear (no doubt about
> >> the returned type) to me.
> >> +1
> >
> > Another +0.5 for me. I think this is quite elegant and reasonable. I'm
> > not sure it needs to be unicode though. Perhaps it's too early for me,
> > but does turning that into a unicode string make sense?

repr() returns a unicode string. hex(), oct() and bin() return unicode 
strings. The intent is to return a human-readable representation of a 
binary object, that is, a string from a bytes object. So, yes, a unicode 
string makes sense.


> It's easy enough to do by hand:
> 
> >>> print(''.join("\\x{:02x}".format(c) for c in b'Abc'))
> \x41\x62\x63
> 
> And you get any other format you like, just by changing the format
> string in there, or the string you join on:
> 
> >>> print(':'.join("{:02x}".format(c) for c in b'Abc'))
> 41:62:63
> 
> Not every one-liner needs to be a builtin...

Until your post just now, there has probably never been anyone anywhere 
who wanted to display b'Abc' as "41:62:63", and there probably never 
will be again. For such a specialised use-case, it's perfectly justified 
to reject a request for such a colon-delimited hex function with "not 
every one-liner...".

But displaying bytes as either "0x416263" or "\x41\x62\x63" hex format 
is not so obscure, especially if you consider pedagogical uses. For 
that, your one-liner is hardly convenient: you have to manually
walk the bytes objects, extracting one byte at a time, format it, debug
the inevitable mistake in the formatting code *wink*, then join all the
substrings. The complexity of the code (little as it is for an expert) 
is enough to distract from the pedagogical message, and not quite 
trivially simple to get right if you aren't a heavy user of string 
formatting codes.

Converting byte strings to a hex representation is quite a common thing 
to do, as witnessed by the (at least) five different ways to do it:

http://bugs.python.org/msg226731

none of which are really obvious or convenient. Hence the long- 
outstanding request for this. (At least four years now.)


-- 
Steven


More information about the Python-ideas mailing list