[Python-Dev] Python 3.x and bytes

Thu May 19 10:05:04 CEST 2011

On 2011-05-19, at 09:49 , Nick Coghlan wrote:
> On Thu, May 19, 2011 at 5:10 AM, Eric Smith <eric at trueblade.com> wrote:
>> On 05/18/2011 12:16 PM, Stephen J. Turnbull wrote:
>>> Robert Collins writes:
>>> 
>>>  > Its probably too late to change, but please don't try to argue that
>>>  > its correct: the continued confusion of folk running into this is
>>>  > evidence that confusion *is happening*. Treat that as evidence and
>>>  > think about how to fix it going forward.
>>> 
>>> Sorry, Rob, but you're just wrong here, and Nick is right.  It's
>>> possible to improve Python 3, but not to "fix" it in this respect.
>>> The Python 3 solution is correct, the Python 2 approach is not.
>>> There's no way to avoid discontinuity and confusion here.
>> 
>> I don't think there's any connection between the way 2.x confused text
>> strings and binary data (which certainly needed addressing) with the way
>> that 3.x returns a different type for byte_str[i] than it does for
>> byte_str[i:i+1]. I think it's the latter that's confusing to people.
>> There's no particular requirement for different types that's needed to
>> fix the byte/str problem.
> 
> It's a mental model problem. People try to think of bytes as
> equivalent to 2.x str and that's just wrong, wrong, wrong. It's far
> closer to array.array('c'). Strings are basically *unique* in
> returning a length 1 instance of themselves for indexing operations.
> For every other sequence type, including tuples, lists and arrays,
> slicing returns a new instance of the same type, while indexing will
> typically return something different.
> 
> Now, we definitely didn't *help* matters by keeping so many of the
> default behaviours of bytes() and bytearray() coupled to ASCII-encoded
> text, but that was a matter of practicality beating purity: there
> really *are* a lot of wire protocols out there that are ASCII based.
> In hindsight, perhaps we should have gone further in breaking things
> to try to make the point about the mental model shift more forcefully.
> (However, that idea carries with it its own problems).

For what it's worth, Erlang's approach to the subject is — in my
opinion — excellent:
binaries (whose literals are called "bit syntax" there) are quite
distinct from strings in both syntax and API, but you can put
chunks of strings within binaries (the bit syntax acts as a container,
in which you can put a literal or non-literal string). This
simultaneously impresses upon the user that binaries are *not* strings
and that they can still easily create binaries from strings.