[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3
Steven D'Aprano
steve at pearwood.info
Thu Sep 11 09:30:46 CEST 2014
On Wed, Sep 10, 2014 at 03:37:03PM +0100, Paul Moore wrote:
> On 10 September 2014 15:24, Ian Cordasco <graffatcolmingov at gmail.com> wrote:
> >>> b'Abc'.decode('hexescapes') --> '\x41\x62\x63'
> >>
> >>
> >> This, OTOH, looks elegant (avoids a new method) and clear (no doubt about
> >> the returned type) to me.
> >> +1
> >
> > Another +0.5 for me. I think this is quite elegant and reasonable. I'm
> > not sure it needs to be unicode though. Perhaps it's too early for me,
> > but does turning that into a unicode string make sense?
repr() returns a unicode string. hex(), oct() and bin() return unicode
strings. The intent is to return a human-readable representation of a
binary object, that is, a string from a bytes object. So, yes, a unicode
string makes sense.
> It's easy enough to do by hand:
>
> >>> print(''.join("\\x{:02x}".format(c) for c in b'Abc'))
> \x41\x62\x63
>
> And you get any other format you like, just by changing the format
> string in there, or the string you join on:
>
> >>> print(':'.join("{:02x}".format(c) for c in b'Abc'))
> 41:62:63
>
> Not every one-liner needs to be a builtin...
Until your post just now, there has probably never been anyone anywhere
who wanted to display b'Abc' as "41:62:63", and there probably never
will be again. For such a specialised use-case, it's perfectly justified
to reject a request for such a colon-delimited hex function with "not
every one-liner...".
But displaying bytes as either "0x416263" or "\x41\x62\x63" hex format
is not so obscure, especially if you consider pedagogical uses. For
that, your one-liner is hardly convenient: you have to manually
walk the bytes objects, extracting one byte at a time, format it, debug
the inevitable mistake in the formatting code *wink*, then join all the
substrings. The complexity of the code (little as it is for an expert)
is enough to distract from the pedagogical message, and not quite
trivially simple to get right if you aren't a heavy user of string
formatting codes.
Converting byte strings to a hex representation is quite a common thing
to do, as witnessed by the (at least) five different ways to do it:
http://bugs.python.org/msg226731
none of which are really obvious or convenient. Hence the long-
outstanding request for this. (At least four years now.)
--
Steven
More information about the Python-ideas
mailing list