[Python-Dev] Python 3.x and bytes

Wed May 18 05:23:07 CEST 2011

On Wed, May 18, 2011 at 3:13 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, May 18, 2011 at 8:27 AM, Ethan Furman <ethan at stoneleaf.us> wrote:
>> On the one hand we have the 'bytes are ascii data' type interface, and on
>> the other we have the 'bytes are a list of integers between 0 - 256'
>> interface.
>
> No. Bytes are a list of integers between 0-256. End of story. Using
> them to represent text as well was precisely the problem with 2.x
> 8-bit strings, since the boundaries got blurred.
>
> However, as a matter of practicality, many byte-oriented protocols use
> ASCII to make elements of the protocol readable by humans. The
> "text-like" elements of the bytes and bytearray types are a concession
> to the existence of those protocols. However, that doesn't make them
> text - they're still binary data streams. If you want to treat them as
> text, convert them to "str" objects first (e.g. that's what
> urlib.urlparse does internally in order to operate on bytes and
> bytearray instances).

This is a not a useful argument - its an implementation choice in
Python 3, and urlparse converting bytes to 'str' to operate on them is
at best a kludge - you're forcing 5 times the storage (the original
bytes + 4 bytes-per-byte when its decoded into unicode) to work on
something which is defined as a BNF * that uses ascii *.

The Python 2 confusion was deplorable, but it doesn't make the Python
3 situation better: its different, but still very awkward for people
to write code that is correct and fast in.

Its probably too late to change, but please don't try to argue that
its correct: the continued confusion of folk running into this is
evidence that confusion *is happening*. Treat that as evidence and
think about how to fix it going forward.

_Rob