Re: [Python-Dev] Python 3.x and bytes

May 17, 2011

      On Wed, May 18, 2011 at 3:13 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
...
On Wed, May 18, 2011 at 8:27 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
...
On the one hand we have the 'bytes are ascii data' type interface, and on
the other we have the 'bytes are a list of integers between 0 - 256'
interface.
No. Bytes are a list of integers between 0-256. End of story. Using
them to represent text as well was precisely the problem with 2.x
8-bit strings, since the boundaries got blurred.
However, as a matter of practicality, many byte-oriented protocols use
ASCII to make elements of the protocol readable by humans. The
"text-like" elements of the bytes and bytearray types are a concession
to the existence of those protocols. However, that doesn't make them
text - they're still binary data streams. If you want to treat them as
text, convert them to "str" objects first (e.g. that's what
urlib.urlparse does internally in order to operate on bytes and
bytearray instances).
This is a not a useful argument - its an implementation choice in
Python 3, and urlparse converting bytes to 'str' to operate on them is
at best a kludge - you're forcing 5 times the storage (the original
bytes + 4 bytes-per-byte when its decoded into unicode) to work on
something which is defined as a BNF * that uses ascii *.

The Python 2 confusion was deplorable, but it doesn't make the Python
3 situation better: its different, but still very awkward for people
to write code that is correct and fast in.

Its probably too late to change, but please don't try to argue that
its correct: the continued confusion of folk running into this is
evidence that confusion *is happening*. Treat that as evidence and
think about how to fix it going forward.

_Rob

Re: [Python-Dev] Python 3.x and bytes

Robert Collins