[Python-ideas] Stop displaying elements of bytes objects as printable ASCII characters in CPython 3

Stephen J. Turnbull stephen at xemacs.org
Thu Sep 11 03:50:15 CEST 2014


Chris Lasher writes:

 > Okay, but a definite -1e6 from me on making my Python interpreter do this:
 > 
 >     >>> my_packed_bytes = struct.pack('ffff', 3.544294848931151e-12,
 > 1.853266900760489e+25, 1.6215185358725202e-19, 0.9742483496665955)
 >     >>> my_packed_bytes
 >     b'Why, Guido? Why?'

If you actually have a struct, why aren't you wrapping
your_packed_bytes in a class that validates the struct and displays it
nicely formatted?  Or, alternatively, simply replaces __repr__?

 > I do understand the utility of peering in to ASCII text, but like Cory
 > Benfield stated earlier:
 > 
 > > I'm saying that I don't get to do debugging with a simple
 > > print statement when using the bytes type to do actual binary work,
 > > while those who are doing sort-of binary work do.
 >
 > Does the inconvenience of having to explicitly call the .asciify()
 > method on a bytes object justify the current behavior for repr() on a
 > bytes object?

Yes.  A choice must be made, because a type has only one repr, and
there's no syntax for choosing it.  It's a question of whose use case
is going to become more convenient and whose becomes less so, and
either choice is *justified*.  Which is *preferred* is a judgment call.
Your judgment doesn't rule, and it definitely doesn't have a weight of
1e6.  At this point even Guido's judgment is likely to be dominated by
backward compatibility, no matter how much he regrets the necessity.
(But I would bet he doesn't regret it at all.)

 > The privilege of being lazy is obstructing the right to see what
 > we've actually got in the bytes object, and is jeopardizing the
 > very argument that "bytes are not strings".

It does not jeopardize the *fact* that bytes are not strings.  People
who don't understand that have a fundamental confusion, and they're
going to want bytes to DWIM when mixed with str in their applications.
And they'll complain when their bytes don't DWIM, and they'll complain
even more when the repr "obstructs the right to see what they've
actually got in the bytes object", which (in their applications) is a
stream containing tokens borrowed from English using the ASCII coded
character set.

I agree with you that they're wrong.  My point is that they're wrong
in such a way that they won't understand that bytes aren't text
strings any better merely because they become harder to read.  They
*know* that there's a text string in there because they put it there!

Cory Benfield wrote and Chris Lasher quoted:

 > > Also, while I'm being picky, 0xDEADBEEF is not a 32-bit pointer,
 > > it's a 32-bit something. Its type is undefined in that It has a
 > > standard usage as a guard word, but still, let's not jump to
 > > conclusions here!

I was not jumping to conclusions.  I was setting up a scenario.  The
actual use case is something like "int *pi = 0xDEADBEEF;".  The point
is that C programmers are deliberately choosing a guard word that is
readable when printed as hexadecimal, and also satisfies certain
restrictions when those bytes are used as a pointer.  That doesn't
mean that they are confusing text with pointers.  The same is true for
Python's repr for bytes.

 > > I do happen to believe that having it be hex would provide a
 > > better pedagogical position ("you know this isn't text because it
 > > looks like gibberish!"), but that ship sailed a long time ago.

I don't think a gibberish repr will confuse people who think that
bytes are text in their application.  They'll just get more peeved at
Python 3, because they know that there's readable text in there, and
Python 3 "obstructs their right to see what's actually in the bytes
object".

Regards,






More information about the Python-ideas mailing list