Make bytes __repr__ and __str__ representation different?

Currently, __repr__ and __str__ representation of bytes is the same. Perhaps it is worth making them different, this will make it easier to visually perceive them as a container of integers from 0 to 255, instead of a mixture of printable and non-printable ascii characters. It is proposed:
a) __str__ - leave unchanged b) __repr__ - represent as sequence of escaped hex
a = bytes([42,43,44,45,46]) a # Current
b'*+-./'
a # Proposed
b'\x2a\x2b\x2d\x2e\x2f'
As you can see, the second example is more easily perceived as a sequence, in which '' is also perceived as ',' in list or tuple. In addition, 2020 is close, it allows the new Pythonistas not to take them as an ascii mixture strings.
With kind regards, -gdg

On Tue, Nov 21, 2017 at 05:37:36PM +0300, Kirill Balunov wrote:
Currently, __repr__ and __str__ representation of bytes is the same. Perhaps it is worth making them different, this will make it easier to visually perceive them as a container of integers from 0 to 255, instead of a mixture of printable and non-printable ascii characters. It is proposed:
a) __str__ - leave unchanged b) __repr__ - represent as sequence of escaped hex
a = bytes([42,43,44,45,46]) a # Current
b'*+-./'
a # Proposed
b'\x2a\x2b\x2d\x2e\x2f'
I'd rather leave __str__ and __repr__ alone. Changing them will have huge backwards compatibility implications. I'd rather give bytes a hexdump() method that returns a string:
'2a 2b 2d 2e 2f'
(possibly with optional arguments to specify the formatting).
As you can see, the second example is more easily perceived as a sequence, in which '' is also perceived as ',' in list or tuple.
I disagree. And if you perceive \ as a separator, why does the sequence start with a separator? And why are there so many x characters?
In addition, 2020 is close, it allows the new Pythonistas not to take them as an ascii mixture strings.
The special role of ASCII is far too important for us to ever completely discard it.

Hi,
While it may shock you, using bytes for "text" makes sense in some areas. Please read the Motivation of the PEP 461: https://www.python.org/dev/peps/pep-0461/#motivation
Victor
2017-11-21 15:37 GMT+01:00 Kirill Balunov kirillbalunov@gmail.com:
Currently, __repr__ and __str__ representation of bytes is the same. Perhaps it is worth making them different, this will make it easier to visually perceive them as a container of integers from 0 to 255, instead of a mixture of printable and non-printable ascii characters. It is proposed:
a) __str__ - leave unchanged b) __repr__ - represent as sequence of escaped hex
a = bytes([42,43,44,45,46]) a # Current
b'*+-./'
a # Proposed
b'\x2a\x2b\x2d\x2e\x2f'
As you can see, the second example is more easily perceived as a sequence, in which '' is also perceived as ',' in list or tuple. In addition, 2020 is close, it allows the new Pythonistas not to take them as an ascii mixture strings.
With kind regards, -gdg
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

On Tue, Nov 21, 2017 at 6:37 AM, Kirill Balunov kirillbalunov@gmail.com wrote:
Currently, __repr__ and __str__ representation of bytes is the same. Perhaps it is worth making them different, this will make it easier to visually perceive them as a container of integers from 0 to 255, instead of a mixture of printable and non-printable ascii characters. It is proposed:
a) __str__ - leave unchanged b) __repr__ - represent as sequence of escaped hex
a = bytes([42,43,44,45,46]) a # Current
b'*+-./'
a # Proposed
b'\x2a\x2b\x2d\x2e\x2f'
supposedly __repr__ is supposed to give an eval-able version -- which your proposal is. But the way you did your example indicates that:
bytes((42, 43, 44, 45, 46))
would be an even better __repr__, if the goal is to make it clear and easy that it is a "container of integers from 0 to 255"
I've been programming since quite some time ago, and hex has NEVER come naturally to me :-)
But backward compatibility and all that :-(
-CHB

2017-11-21 20:22 GMT+03:00 Chris Barker chris.barker@noaa.gov wrote:
But the way you did your example indicates that:
bytes((42, 43, 44, 45, 46))
would be an even better __repr__, if the goal is to make it clear and easy that it is a "container of integers from 0 to 255"
I've been programming since quite some time ago, and hex has NEVER come naturally to me :-)
Yes, it is better, but it seemed too radical to me:)
2017-11-21 18:16 GMT+03:00 Steven D'Aprano steve@pearwood.info wrote:
I'd rather give bytes a hexdump() method that returns a string:
'2a 2b 2d 2e 2f'
(possibly with optional arguments to specify the formatting).
Since Python 3.5 bytes has a .hex() method, the same as yours .hexdump() but without spaces. But still it is a string.
2017-11-21 18:38 GMT+03:00 Victor Stinner victor.stinner@gmail.com:
While it may shock you, using bytes for "text" makes sense in some areas. Please read the Motivation of the PEP 461: https://www.python.org/dev/peps/pep-0461/#motivation
It does not, because it is really useful feature. But rather, ascii was made so that it would fit into a byte, and not vice versa.
Nevertheless, bytes are the strangest object in Python. It looks like a string (which contains only ascii), but it is not a string, because if you index, it does not return a byte -> bytes (b'123 '[0])! = Bytes (b'1'). May be it is closer to tuple, it is also immutable, but bytes(3) creates a sequence b '\ x00 \ x00 \ x00', but tuple not (and what the hell is b'\x00\x00\x00'?). Maybe it has some relationship to integers but int(b'1') == 1 when bytes([int(49)]) == b'1', i.e with integers it is not a friend either.
It is bytes...
With kind regards, -gdg

On Tue, Nov 21, 2017 at 11:22 AM, Chris Barker chris.barker@noaa.gov wrote:
supposedly __repr__ is supposed to give an eval-able version -- which your proposal is. But the way you did your example indicates that:
bytes((42, 43, 44, 45, 46))
would be an even better __repr__, if the goal is to make it clear and easy that it is a "container of integers from 0 to 255"
I wonder if for repr-synonyms, a format specifier to `repr()` could toggle how the object chooses to display itself would be handy:
x = b'*+-./' repr(x) # b'*+-./' repr(x, bytes.REPR_HEX_STRING) # b'\x2a\x2b\x2c\x2d\x2e' repr(x, bytes.REPR_BYTES) # bytes([42, 43, 44, 45, 46]) repr(x, bytes.REPR_HEX_BYTES) # bytes([0x2A, 0x2B, 0x2C, 0x2D, 0x2E])
Kinda like `format()` but such that all of `eval(repr(x, <whatever>))` are equal.

On Wed, Nov 22, 2017 at 9:49 AM, Nick Timkovich prometheus235@gmail.com wrote:
On Tue, Nov 21, 2017 at 11:22 AM, Chris Barker chris.barker@noaa.gov wrote:
supposedly __repr__ is supposed to give an eval-able version -- which your proposal is. But the way you did your example indicates that:
bytes((42, 43, 44, 45, 46))
would be an even better __repr__, if the goal is to make it clear and easy that it is a "container of integers from 0 to 255"
I wonder if for repr-synonyms, a format specifier to `repr()` could toggle how the object chooses to display itself would be handy:
x = b'*+-./' repr(x) # b'*+-./' repr(x, bytes.REPR_HEX_STRING) # b'\x2a\x2b\x2c\x2d\x2e' repr(x, bytes.REPR_BYTES) # bytes([42, 43, 44, 45, 46]) repr(x, bytes.REPR_HEX_BYTES) # bytes([0x2A, 0x2B, 0x2C, 0x2D, 0x2E])
Kinda like `format()` but such that all of `eval(repr(x, <whatever>))` are equal.
Methods are usually the best for that. Possibly with class methods to perform the reconstruction - which in this case you have:
b"asdf".hex()
'61736466'
bytes.fromhex(_)
b'asdf'
ChrisA
participants (6)
-
Chris Angelico
-
Chris Barker
-
Kirill Balunov
-
Nick Timkovich
-
Steven D'Aprano
-
Victor Stinner