[Python-checkins] r53860 - peps/trunk/pep-0000.txt peps/trunk/pep-0358.txt

Fri Feb 23 00:57:47 CET 2007

Author: guido.van.rossum
Date: Fri Feb 23 00:57:46 2007
New Revision: 53860

Modified:
   peps/trunk/pep-0000.txt
   peps/trunk/pep-0358.txt
Log:
Update the bytes object to better resemble my intentions.


Modified: peps/trunk/pep-0000.txt
==============================================================================

--- peps/trunk/pep-0000.txt	(original)
+++ peps/trunk/pep-0000.txt	Fri Feb 23 00:57:46 2007
@@ -107,7 +107,7 @@
  I   350  Codetags                                     Elliott
  S   354  Enumerations in Python                       Finney
  S   355  Path - Object oriented filesystem paths      Lindqvist
- S   358  The "bytes" Object                           Schemenauer
+ S   358  The "bytes" Object                           Schemenauer, GvR
  S   362  Function Signature Object                    Cannon, Seo
  S   754  IEEE 754 Floating Point Special Values       Warnes
  S  3101  Advanced String Formatting                   Talin
@@ -431,7 +431,7 @@
  S   355  Path - Object oriented filesystem paths      Lindqvist
  IF  356  Python 2.5 Release Schedule                  Norwitz, et al
  SF  357  Allowing Any Object to be Used for Slicing   Oliphant
- S   358  The "bytes" Object                           Schemenauer
+ S   358  The "bytes" Object                           Schemenauer, GvR
  SW  359  The "make" Statement                         Bethard
  I   360  Externally Maintained Packages               Cannon
  I   361  Python 2.6 Release Schedule                  Norwitz, et al

Modified: peps/trunk/pep-0358.txt
==============================================================================
--- peps/trunk/pep-0358.txt	(original)
+++ peps/trunk/pep-0358.txt	Fri Feb 23 00:57:46 2007
@@ -2,12 +2,12 @@
 Title: The "bytes" Object
 Version: $Revision$
 Last-Modified: $Date$
-Author: Neil Schemenauer <nas at arctrix.com>
+Author: Neil Schemenauer <nas at arctrix.com>, Guido van Rossum <guido at google.com>
 Status: Draft
 Type: Standards Track
 Content-Type: text/plain
 Created: 15-Feb-2006
-Python-Version: 2.5
+Python-Version: 2.6, 3.0
 Post-History:
 
 
@@ -20,74 +20,86 @@
 
 Motivation
 
-    Python's current string objects are overloaded. They serve to hold
-    both sequences of characters and sequences of bytes. This
-    overloading of purpose leads to confusion and bugs. In future
+    Python's current string objects are overloaded.  They serve to hold
+    both sequences of characters and sequences of bytes.  This
+    overloading of purpose leads to confusion and bugs.  In future
     versions of Python, string objects will be used for holding
-    character data. The bytes object will fulfil the role of a byte
-    container. Eventually the unicode built-in will be renamed to str
+    character data.  The bytes object will fulfil the role of a byte
+    container.  Eventually the unicode built-in will be renamed to str
     and the str object will be removed.
 
 
 Specification
 
-    A bytes object stores a mutable sequence of integers that are in the
-    range 0 to 255.  Unlike string objects, indexing a bytes object
-    returns an integer.  Assigning an element using a object that is not
-    an integer causes a TypeError exception.  Assigning an element to a
-    value outside the range 0 to 255 causes a ValueError exception.  The
-    __len__ method of bytes returns the number of integers stored in the
-    sequence (i.e. the number of bytes).
+    A bytes object stores a mutable sequence of integers that are in
+    the range 0 to 255.  Unlike string objects, indexing a bytes
+    object returns an integer.  Assigning an element using a object
+    that is not an integer causes a TypeError exception.  Assigning an
+    element to a value outside the range 0 to 255 causes a ValueError
+    exception.  The .__len__() method of bytes returns the number of
+    integers stored in the sequence (i.e. the number of bytes).
 
     The constructor of the bytes object has the following signature:
 
         bytes([initialiser[, [encoding]])
 
     If no arguments are provided then an object containing zero elements
-    is created and returned.  The initialiser argument can be a string
-    or a sequence of integers.  The pseudo-code for the constructor is:
+    is created and returned.  The initialiser argument can be a string,
+    a sequence of integers, or a single integer.  The pseudo-code for the
+    constructor is:
 
         def bytes(initialiser=[], encoding=None):
-            if isinstance(initialiser, basestring):
-                if isinstance(initialiser, unicode):
+            if isinstance(initialiser, int): # In 2.6, (int, long)
+                initialiser = [0]*initialiser
+            elif isinstance(initialiser, basestring):
+                if isinstance(initialiser, unicode): # In 3.0, always
                     if encoding is None:
+                        # In 3.0, raise TypeError("explicit encoding required")
                         encoding = sys.getdefaultencoding()
                     initialiser = initialiser.encode(encoding)
                 initialiser = [ord(c) for c in initialiser]
-            elif encoding is not None:
-                raise TypeError("explicit encoding invalid for non-string "
-                                "initialiser")
-            create bytes object and fill with integers from initialiser
+            else:
+                if encoding is not None:
+                    raise TypeError("explicit encoding invalid for non-string "
+                                    "initialiser")
+            # Create bytes object and fill with integers from initialiser
+            # while ensuring each integer is in range(256); initialiser
+            # can be any iterable
             return bytes object
 
-    The __repr__ method returns a string that can be evaluated to
+    The .__repr__() method returns a string that can be evaluated to
     generate a new bytes object containing the same sequence of
-    integers.  The sequence is represented by a list of ints.  For
-    example:
+    integers.  The sequence is represented by a list of ints using
+    hexadecimal notation.  For example:
 
         >>> repr(bytes[10, 20, 30])
-        'bytes([10, 20, 30])'
+        'bytes([0x0a, 0x14, 0x1e])'
 
-    The object has a decode method equivalent to the decode method of
-    the str object.  The object has a classmethod fromhex that takes a
-    string of characters from the set [0-9a-zA-Z ] and returns a bytes
-    object (similar to binascii.unhexlify).  For example:
+    The object has a .decode() method equivalent to the .decode()
+    method of the str object.  (This is redundant since it can also be
+    decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
+    <encoding>) (in 3.0); do we need encode/decode methods?  In a
+    sense the spelling using a constructor is cleaner.)  The object
+    has a classmethod .fromhex() that takes a string of characters
+    from the set [0-9a-zA-Z ] and returns a bytes object (similar to
+    binascii.unhexlify).  For example:
 
         >>> bytes.fromhex('5c5350ff')
         bytes([92, 83, 80, 255]])
         >>> bytes.fromhex('5c 53 50 ff')
         bytes([92, 83, 80, 255]])
 
-    The object has a hex method that does the reverse conversion
+    The object has a .hex() method that does the reverse conversion
     (similar to binascii.hexlify):
 
         >> bytes([92, 83, 80, 255]]).hex()
         '5c5350ff'
 
-    The bytes object has methods similar to the list object:
+    The bytes object has some methods similar to list method, and
+    others similar to str methods:
 
         __add__
-        __contains__
+        __contains__ (with int arg, like list; with bytes arg, like str)
         __delitem__
         __delslice__
         __eq__
@@ -95,7 +107,6 @@
         __getitem__
         __getslice__
         __gt__
-        __hash__
         __iadd__
         __imul__
         __iter__
@@ -107,16 +118,39 @@
         __reduce__
         __reduce_ex__
         __repr__
+        __reversed__
         __rmul__
         __setitem__
         __setslice__
         append
         count
+        decode
+        endswith
         extend
+        find
         index
         insert
+        join
+        partition
         pop
         remove
+        replace
+        rindex
+        rpartition
+        split
+        startswith
+        reverse
+        rfind
+        rindex
+        rsplit
+        translate
+
+    Note the conspicuous absence of .isupper(), .upper(), and friends.
+    There is no __hash__ because the object is mutable.  There is no
+    usecase for a .sort() method.
+
+    The bytes also supports the buffer interface, supporting reading
+    and writing binary (but not character) data.
 
 
 Out of scope issues
@@ -127,7 +161,9 @@
       (which requires lexer and parser support in addition to everything
       else).  Since there appears to be no immediate need for a literal
       representation, designing and implementing one is out of the scope
-      of this PEP.
+      of this PEP.  (Hmm...  A b"..." literal accepting only ASCII
+      values is likely to be added to 3.0; not clear about 2.6.  This
+      needs a PEP.)
 
     * Python 3k will have a much different I/O subsystem.  Deciding how
       that I/O subsystem will work and interact with the bytes object is
@@ -140,19 +176,19 @@
 
 Unresolved issues
 
-    * Perhaps the bytes object should be implemented as a extension
-      module until we are more sure of the design (similar to how the
-      set object was prototyped).
-
-    * Should the bytes object implement the buffer interface?  Probably,
-      but we need to look into the implications of that (e.g. regex
-      operations on byte arrays).
+    * Need to specify the methods more carefully.  
+
+    * Should all those list methods really be implemented?
+
+    * A case could be made for supporting .ljust(), .rjust(),
+      .center() with a mandatory second argument.
+
+    * A case could be made for supporting .split() with a mandatory
+      argument.
 
-    * Should the object implement __reversed__ and reverse?  Should it
-      implement sort?
+    * How should pickling and marshalling work?
 
-    * Need to clarify what some of the methods do.  How are comparisons
-      done?  Hashing?  Pickling and marshalling?
+    * I probably forgot a few things.
 
 
 Questions and answers
@@ -174,7 +210,7 @@
 
 
     Q: Why does bytes ignore the encoding argument if the initialiser is
-       a str?
+       a str?  (This only applies to 2.6.)
 
     A: There is no sane meaning that the encoding can have in that case.
        str objects *are* byte arrays and they know nothing about the