[Python-3000] Last call for PEP 3137: Immutable Bytes and Mutable Buffer

Brett Cannon brett at python.org
Mon Oct 1 02:11:38 CEST 2007


+1 from me.

-Brett

On 9/30/07, Guido van Rossum <guido at python.org> wrote:
> Thanks all for the focused and helpful discussion on this PEP. Here's
> a new posting of the full text of the PEP as it now stands. Most of
> the changes since the first posting are fleshing out of some details;
> the decision to make the individual elements of bytes and buffer be
> ints; and the decision to change bytes/str and buffer/str comparisons
> again to just return False instead of raising TypeError.
>
> (I'm not favorable towards the proposal of c'x' style literals or
> changes to the I/O APIs to use different names for calls involving
> bytes instead of text. If you still disagree, please start a new
> thread with new subject line.)
>
> I plan to accept the PEP within a day or two barring major objections,
> and expect to start implementing soon after.
>
> --Guido
>
> PEP: 3137
> Title: Immutable Bytes and Mutable Buffer
> Version: $Revision: 58290 $
> Last-Modified: $Date: 2007-09-30 16:19:14 -0700 (Sun, 30 Sep 2007) $
> Author: Guido van Rossum <guido at python.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 26-Sep-2007
> Python-Version: 3.0
> Post-History: 26-Sep-2007, 30-Sep-2007
>
> Introduction
> ============
>
> After releasing Python 3.0a1 with a mutable bytes type, pressure
> mounted to add a way to represent immutable bytes.  Gregory P. Smith
> proposed a patch that would allow making a bytes object temporarily
> immutable by requesting that the data be locked using the new buffer
> API from PEP 3118.  This did not seem the right approach to me.
>
> Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to
> make the bytes type immutable (by crudely removing all mutating APIs)
> and fix the fall-out in the test suite.  This showed that there aren't
> all that many places that depend on the mutability of bytes, with the
> exception of code that builds up a return value from small pieces.
>
> Thinking through the consequences, and noticing that using the array
> module as an ersatz mutable bytes type is far from ideal, and
> recalling a proposal put forward earlier by Talin, I floated the
> suggestion to have both a mutable and an immutable bytes type.  (This
> had been brought up before, but until seeing the evidence of Jeffrey's
> patch I wasn't open to the suggestion.)
>
> Moreover, a possible implementation strategy became clear: use the old
> PyString implementation, stripped down to remove locale support and
> implicit conversions to/from Unicode, for the immutable bytes type,
> and keep the new PyBytes implementation as the mutable bytes type.
>
> The ensuing discussion made it clear that the idea is welcome but
> needs to be specified more precisely.  Hence this PEP.
>
> Advantages
> ==========
>
> One advantage of having an immutable bytes type is that code objects
> can use these.  It also makes it possible to efficiently create hash
> tables using bytes for keys; this may be useful when parsing protocols
> like HTTP or SMTP which are based on bytes representing text.
>
> Porting code that manipulates binary data (or encoded text) in Python
> 2.x will be easier using the new design than using the original 3.0
> design with mutable bytes; simply replace ``str`` with ``bytes`` and
> change '...' literals into b'...' literals.
>
> Naming
> ======
>
> I propose the following type names at the Python level:
>
>   - ``bytes`` is an immutable array of bytes (PyString)
>
>   - ``buffer`` is a mutable array of bytes (PyBytes)
>
>   - ``memoryview`` is a bytes view on another object (PyMemory)
>
> The old type named ``buffer`` is so similar to the new type
> ``memoryview``, introduce by PEP 3118, that it is redundant.  The rest
> of this PEP doesn't discuss the functionality of ``memoryview``; it is
> just mentioned here to justify getting rid of the old ``buffer`` type
> so we can reuse its name for the mutable bytes type.
>
> While eventually it makes sense to change the C API names, this PEP
> maintains the old C API names, which should be familiar to all.
>
> Literal Notations
> =================
>
> The b'...' notation introduced in Python 3.0a1 returns an immutable
> bytes object, whatever variation is used.  To create a mutable bytes
> buffer object, use buffer(b'...') or buffer([...]).  The latter may
> use a list of integers in range(256).
>
> Functionality
> =============
>
> PEP 3118 Buffer API
> -------------------
>
> Both bytes and buffer implement the PEP 3118 buffer API.  The bytes
> type only implements read-only requests; the buffer type allows
> writable and data-locked requests as well.  The element data type is
> always 'B' (i.e. unsigned byte).
>
> Constructors
> ------------
>
> There are four forms of constructors, applicable to both bytes and
> buffer:
>
>   - ``bytes(<bytes>)``, ``bytes(<buffer>)``, ``buffer(<bytes>)``,
>     ``buffer(<buffer>)``: simple copying constructors, with the note
>     that ``bytes(<bytes>)`` might return its (immutable) argument.
>
>   - ``bytes(<str>, <encoding>[, <errors>])``, ``buffer(<str>,
>     <encoding>[, <errors>])``: encode a text string.  Note that the
>     ``str.encode()`` method returns an *immutable* bytes object.
>     The <encoding> argument is mandatory; <errors> is optional.
>
>   - ``bytes(<memory view>)``, ``buffer(<memory view>)``: construct a
>     bytes or buffer object from anything implementing the PEP 3118
>     buffer API.
>
>   - ``bytes(<iterable of ints>)``, ``buffer(<iterable of ints>)``:
>     construct an immutable bytes or mutable buffer object from a
>     stream of integers in range(256).
>
>   - ``buffer(<int>)``: construct a zero-initialized buffer of a given
>     length.
>
> Comparisons
> -----------
>
> The bytes and buffer types are comparable with each other and
> orderable, so that e.g. b'abc' == buffer(b'abc') < b'abd'.
>
> Comparing either type to a str object for equality returns False
> regardless of the contents of either operand.  Ordering comparisons
> with str raise TypeError.  This is all conformant to the standard
> rules for comparison and ordering between objects of incompatible
> types.
>
> (**Note:** in Python 3.0a1, comparing a bytes instance with a str
> instance would raise TypeError, on the premise that this would catch
> the occasional mistake quicker, especially in code ported from Python
> 2.x.  However, a long discussion on the python-3000 list pointed out
> so many problems with this that it is clearly a bad idea, to be rolled
> back in 3.0a2 regardless of the fate of the rest of this PEP.)
>
> Slicing
> -------
>
> Slicing a bytes object returns a bytes object.  Slicing a buffer
> object returns a buffer object.
>
> Slice assignment to a mutable buffer object accept anything that
> implements the PEP 3118 buffer API, or an iterable of integers in
> range(256).
>
> Indexing
> --------
>
> Indexing bytes and buffer returns small ints (like the bytes type in
> 3.0a1, and like lists or array.array('B')).
>
> Assignment to an item of a mutable buffer object accepts an int in
> range(256).  (To assign from a bytes sequence, use a slice
> assignment.)
>
> Str() and Repr()
> ----------------
>
> The str() and repr() functions return the same thing for these
> objects.  The repr() of a bytes object returns a b'...' style literal.
> The repr() of a buffer returns a string of the form "buffer(b'...')".
>
> Operators
> ---------
>
> The following operators are implemented by the bytes and buffer types,
> except where mentioned:
>
>   - ``b1 + b2``: concatenation.  With mixed bytes/buffer operands,
>     the return type is that of the first argument (this seems arbitrary
>     until you consider how ``+=`` works).
>
>   - ``b1 += b2'': mutates b1 if it is a buffer object.
>
>   - ``b * n``, ``n * b``: repetition; n must be an integer.
>
>   - ``b *= n``: mutates b if it is a buffer object.
>
>   - ``b1 in b2``, ``b1 not in b2``: substring test; b1 can be any
>     object implementing the PEP 3118 buffer API.
>
>   - ``i in b``, ``i not in b``: single-byte membership test; i must
>     be an integer (if it is a length-1 bytes array, it is considered
>     to be a substring test, with the same outcome).
>
>   - ``len(b)``: the number of bytes.
>
>   - ``hash(b)``: the hash value; only implemented by the bytes type.
>
> Note that the % operator is *not* implemented.  It does not appear
> worth the complexity.
>
> Methods
> -------
>
> The following methods are implemented by bytes as well as buffer, with
> similar semantics.  They accept anything that implements the PEP 3118
> buffer API for bytes arguments, and return the same type as the object
> whose method is called ("self")::
>
>   .capitalize(), .center(), .count(), .decode(), .endswith(),
>   .expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
>   .islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
>   .lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
>   .rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
>   .splitlines(), .startswith(), .strip(), .swapcase(), .title(),
>   .translate(), .upper(), .zfill()
>
> This is exactly the set of methods present on the str type in Python
> 2.x, with the exclusion of .encode().  The signatures and semantics
> are the same too.  However, whenever character classes like letter,
> whitespace, lower case are used, the ASCII definitions of these
> classes are used.  (The Python 2.x str type uses the definitions from
> the current locale, settable through the locale module.)  The
> .encode() method is left out because of the more strict definitions of
> encoding and decoding in Python 3000: encoding always takes a Unicode
> string and returns a bytes sequence, and decoding always takes a bytes
> sequence and returns a Unicode string.
>
> In addition, both types implement the class method ``.fromhex()``,
> which constructs an object from a string containing hexadecimal values
> (with or without spaces between the bytes).
>
> The buffer type implements these additional methods from the
> MutableSequence ABC (see PEP 3119):
>
>   .extend(), .insert(), .append(), .reverse(), .pop(), .remove().
>
> Bytes and the Str Type
> ----------------------
>
> Like the bytes type in Python 3.0a1, and unlike the relationship
> between str and unicode in Python 2.x, any attempt to mix bytes (or
> buffer) objects and str objects without specifying an encoding will
> raise a TypeError exception.  This is the case even for simply
> comparing a bytes or buffer object to a str object (even violating the
> general rule that comparing objects of different types for equality
> should just return False).
>
> Conversions between bytes or buffer objects and str objects must
> always be explicit, using an encoding.  There are two equivalent APIs:
> ``str(b, <encoding>[, <errors>])`` is equivalent to
> ``b.decode(<encoding>[, <errors>])``, and
> ``bytes(s, <encoding>[, <errors>])`` is equivalent to
> ``s.encode(<encoding>[, <errors>])``.
>
> There is one exception: we can convert from bytes (or buffer) to str
> without specifying an encoding by writing ``str(b)``.  This produces
> the same result as ``repr(b)``.  This exception is necessary because
> of the general promise that *any* object can be printed, and printing
> is just a special case of conversion to str.  There is however no
> promise that printing a bytes object interprets the individual bytes
> as characters (unlike in Python 2.x).
>
> The str type currently implements the PEP 3118 buffer API.  While this
> is perhaps occasionally convenient, it is also potentially confusing,
> because the bytes accessed via the buffer API represent a
> platform-depending encoding: depending on the platform byte order and
> a compile-time configuration option, the encoding could be UTF-16-BE,
> UTF-16-LE, UTF-32-BE, or UTF-32-LE.  Worse, a different implementation
> of the str type might completely change the bytes representation,
> e.g. to UTF-8, or even make it impossible to access the data as a
> contiguous array of bytes at all.  Therefore, the PEP 3118 buffer API
> will be removed from the str type.
>
> Pickling
> --------
>
> Left as an exercise for the reader.
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/brett%40python.org
>


More information about the Python-3000 mailing list