[Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Alexandre Vassalotti
alexandre at peadrop.com
Thu Sep 27 04:36:08 CEST 2007
On 9/26/07, Guido van Rossum <guido at python.org> wrote:
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
> - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``,
> ``buffer(<buffer>)``: simple copying constructors, with the note
> that ``bytes(<bytes>)`` might return its (immutable) argument.
>
> - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>,
> <encoding>[, <errors>])``: encode a text string. Note that the
> ``str.encode()`` method returns an *immutable* bytes object.
> The <encoding> argument is mandatory; <errors> is optional.
>
> - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a
> bytes or buffer object from anything that supports the PEP 3118
> buffer API.
>
> - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``:
> construct an immutable bytes or mutable buffer object from a
> stream of integers in range(256).
>
> - ``buffer(<int>)``: construct a zero-initialized buffer of a given
> lenth.
>
I think this section could be better organized. I had to read a few time
to fully understand it. Maybe a table would emphasize better the differences
between the two constructors.
> Indexing
> --------
>
> **Open Issue:** I'm undecided on whether indexing bytes and buffer
> objects should return small ints (like the bytes type in 3.0a1, and
> like lists or array.array('B')), or bytes/buffer objects of length 1
> (like the str type). The latter (str-like) approach will ease porting
> code from Python 2.x; but it makes it harder to extract values from a
> bytes array.
I think indexing a bytes/buffer object should return an int. I find
this behavior
more natural, to me, than using an ord()-like function to extract
values. In fact, I
remarked that the use of ord() is good indicator that bytes should be used
instead of str (look by yourself: grep -R --include='*.py' 'ord(' python25/Lib).
> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects. The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".
Does that mean calling str() on a bytes/buffer object -- e.g., str(b"abc")
-- wouldn't decode the content of the object (like array objects)?
> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception. This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding. There are two equivalent APIs:
> ``str(b, <encoding>[, <errors>])`` is equivalent to
> ``b.encode(<encoding>[, <errors>])``, and
> ``bytes(s, <encoding>[, <errors>])`` is equivalent to
> ``s.decode(<encoding>[, <errors>])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``. This produces
> the same result as ``repr(b)``. This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str. There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).
Ah! That answers my last question. :)
-- Alexandre
More information about the Python-3000
mailing list