[Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer

Thu Sep 27 04:36:08 CEST 2007

On 9/26/07, Guido van Rossum <guido at python.org> wrote:
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
>   - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``,
>     ``buffer(<buffer>)``: simple copying constructors, with the note
>     that ``bytes(<bytes>)`` might return its (immutable) argument.
>
>   - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>,
>     <encoding>[, <errors>])``: encode a text string.  Note that the
>     ``str.encode()`` method returns an *immutable* bytes object.
>     The <encoding> argument is mandatory; <errors> is optional.
>
>   - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a
>     bytes or buffer object from anything that supports the PEP 3118
>     buffer API.
>
>   - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``:
>     construct an immutable bytes or mutable buffer object from a
>     stream of integers in range(256).
>
>   - ``buffer(<int>)``: construct a zero-initialized buffer of a given
>     lenth.
>

I think this section could be better organized. I had to read a few time
to fully understand it. Maybe a table would emphasize better the differences
between the two constructors.

> Indexing
> --------
>
> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type).  The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.

I think indexing a bytes/buffer object should return an int. I find
this behavior
more natural, to me, than using an ord()-like function to extract
values. In fact, I
remarked that the use of ord() is good indicator that bytes should be used
instead of str (look by yourself: grep -R --include='*.py' 'ord(' python25/Lib).

> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects.  The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".

Does that mean calling str() on a bytes/buffer object -- e.g., str(b"abc")
-- wouldn't decode the content of the object (like array objects)?

> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, <encoding>[, <errors>])`` is equivalent to
> ``b.encode(<encoding>[, <errors>])``, and
> ``bytes(s, <encoding>[, <errors>])`` is equivalent to
> ``s.decode(<encoding>[, <errors>])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``.  This produces
> the same result as ``repr(b)``.  This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str.  There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).

Ah! That answers my last question. :)

-- Alexandre