[Python-checkins] r53860 - peps/trunk/pep-0000.txt peps/trunk/pep-0358.txt
guido.van.rossum
python-checkins at python.org
Fri Feb 23 00:57:47 CET 2007
Author: guido.van.rossum
Date: Fri Feb 23 00:57:46 2007
New Revision: 53860
Modified:
peps/trunk/pep-0000.txt
peps/trunk/pep-0358.txt
Log:
Update the bytes object to better resemble my intentions.
Modified: peps/trunk/pep-0000.txt
==============================================================================
--- peps/trunk/pep-0000.txt (original)
+++ peps/trunk/pep-0000.txt Fri Feb 23 00:57:46 2007
@@ -107,7 +107,7 @@
I 350 Codetags Elliott
S 354 Enumerations in Python Finney
S 355 Path - Object oriented filesystem paths Lindqvist
- S 358 The "bytes" Object Schemenauer
+ S 358 The "bytes" Object Schemenauer, GvR
S 362 Function Signature Object Cannon, Seo
S 754 IEEE 754 Floating Point Special Values Warnes
S 3101 Advanced String Formatting Talin
@@ -431,7 +431,7 @@
S 355 Path - Object oriented filesystem paths Lindqvist
IF 356 Python 2.5 Release Schedule Norwitz, et al
SF 357 Allowing Any Object to be Used for Slicing Oliphant
- S 358 The "bytes" Object Schemenauer
+ S 358 The "bytes" Object Schemenauer, GvR
SW 359 The "make" Statement Bethard
I 360 Externally Maintained Packages Cannon
I 361 Python 2.6 Release Schedule Norwitz, et al
Modified: peps/trunk/pep-0358.txt
==============================================================================
--- peps/trunk/pep-0358.txt (original)
+++ peps/trunk/pep-0358.txt Fri Feb 23 00:57:46 2007
@@ -2,12 +2,12 @@
Title: The "bytes" Object
Version: $Revision$
Last-Modified: $Date$
-Author: Neil Schemenauer <nas at arctrix.com>
+Author: Neil Schemenauer <nas at arctrix.com>, Guido van Rossum <guido at google.com>
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 15-Feb-2006
-Python-Version: 2.5
+Python-Version: 2.6, 3.0
Post-History:
@@ -20,74 +20,86 @@
Motivation
- Python's current string objects are overloaded. They serve to hold
- both sequences of characters and sequences of bytes. This
- overloading of purpose leads to confusion and bugs. In future
+ Python's current string objects are overloaded. They serve to hold
+ both sequences of characters and sequences of bytes. This
+ overloading of purpose leads to confusion and bugs. In future
versions of Python, string objects will be used for holding
- character data. The bytes object will fulfil the role of a byte
- container. Eventually the unicode built-in will be renamed to str
+ character data. The bytes object will fulfil the role of a byte
+ container. Eventually the unicode built-in will be renamed to str
and the str object will be removed.
Specification
- A bytes object stores a mutable sequence of integers that are in the
- range 0 to 255. Unlike string objects, indexing a bytes object
- returns an integer. Assigning an element using a object that is not
- an integer causes a TypeError exception. Assigning an element to a
- value outside the range 0 to 255 causes a ValueError exception. The
- __len__ method of bytes returns the number of integers stored in the
- sequence (i.e. the number of bytes).
+ A bytes object stores a mutable sequence of integers that are in
+ the range 0 to 255. Unlike string objects, indexing a bytes
+ object returns an integer. Assigning an element using a object
+ that is not an integer causes a TypeError exception. Assigning an
+ element to a value outside the range 0 to 255 causes a ValueError
+ exception. The .__len__() method of bytes returns the number of
+ integers stored in the sequence (i.e. the number of bytes).
The constructor of the bytes object has the following signature:
bytes([initialiser[, [encoding]])
If no arguments are provided then an object containing zero elements
- is created and returned. The initialiser argument can be a string
- or a sequence of integers. The pseudo-code for the constructor is:
+ is created and returned. The initialiser argument can be a string,
+ a sequence of integers, or a single integer. The pseudo-code for the
+ constructor is:
def bytes(initialiser=[], encoding=None):
- if isinstance(initialiser, basestring):
- if isinstance(initialiser, unicode):
+ if isinstance(initialiser, int): # In 2.6, (int, long)
+ initialiser = [0]*initialiser
+ elif isinstance(initialiser, basestring):
+ if isinstance(initialiser, unicode): # In 3.0, always
if encoding is None:
+ # In 3.0, raise TypeError("explicit encoding required")
encoding = sys.getdefaultencoding()
initialiser = initialiser.encode(encoding)
initialiser = [ord(c) for c in initialiser]
- elif encoding is not None:
- raise TypeError("explicit encoding invalid for non-string "
- "initialiser")
- create bytes object and fill with integers from initialiser
+ else:
+ if encoding is not None:
+ raise TypeError("explicit encoding invalid for non-string "
+ "initialiser")
+ # Create bytes object and fill with integers from initialiser
+ # while ensuring each integer is in range(256); initialiser
+ # can be any iterable
return bytes object
- The __repr__ method returns a string that can be evaluated to
+ The .__repr__() method returns a string that can be evaluated to
generate a new bytes object containing the same sequence of
- integers. The sequence is represented by a list of ints. For
- example:
+ integers. The sequence is represented by a list of ints using
+ hexadecimal notation. For example:
>>> repr(bytes[10, 20, 30])
- 'bytes([10, 20, 30])'
+ 'bytes([0x0a, 0x14, 0x1e])'
- The object has a decode method equivalent to the decode method of
- the str object. The object has a classmethod fromhex that takes a
- string of characters from the set [0-9a-zA-Z ] and returns a bytes
- object (similar to binascii.unhexlify). For example:
+ The object has a .decode() method equivalent to the .decode()
+ method of the str object. (This is redundant since it can also be
+ decoded by calling unicode(b, <encoding>) (in 2.6) or str(b,
+ <encoding>) (in 3.0); do we need encode/decode methods? In a
+ sense the spelling using a constructor is cleaner.) The object
+ has a classmethod .fromhex() that takes a string of characters
+ from the set [0-9a-zA-Z ] and returns a bytes object (similar to
+ binascii.unhexlify). For example:
>>> bytes.fromhex('5c5350ff')
bytes([92, 83, 80, 255]])
>>> bytes.fromhex('5c 53 50 ff')
bytes([92, 83, 80, 255]])
- The object has a hex method that does the reverse conversion
+ The object has a .hex() method that does the reverse conversion
(similar to binascii.hexlify):
>> bytes([92, 83, 80, 255]]).hex()
'5c5350ff'
- The bytes object has methods similar to the list object:
+ The bytes object has some methods similar to list method, and
+ others similar to str methods:
__add__
- __contains__
+ __contains__ (with int arg, like list; with bytes arg, like str)
__delitem__
__delslice__
__eq__
@@ -95,7 +107,6 @@
__getitem__
__getslice__
__gt__
- __hash__
__iadd__
__imul__
__iter__
@@ -107,16 +118,39 @@
__reduce__
__reduce_ex__
__repr__
+ __reversed__
__rmul__
__setitem__
__setslice__
append
count
+ decode
+ endswith
extend
+ find
index
insert
+ join
+ partition
pop
remove
+ replace
+ rindex
+ rpartition
+ split
+ startswith
+ reverse
+ rfind
+ rindex
+ rsplit
+ translate
+
+ Note the conspicuous absence of .isupper(), .upper(), and friends.
+ There is no __hash__ because the object is mutable. There is no
+ usecase for a .sort() method.
+
+ The bytes also supports the buffer interface, supporting reading
+ and writing binary (but not character) data.
Out of scope issues
@@ -127,7 +161,9 @@
(which requires lexer and parser support in addition to everything
else). Since there appears to be no immediate need for a literal
representation, designing and implementing one is out of the scope
- of this PEP.
+ of this PEP. (Hmm... A b"..." literal accepting only ASCII
+ values is likely to be added to 3.0; not clear about 2.6. This
+ needs a PEP.)
* Python 3k will have a much different I/O subsystem. Deciding how
that I/O subsystem will work and interact with the bytes object is
@@ -140,19 +176,19 @@
Unresolved issues
- * Perhaps the bytes object should be implemented as a extension
- module until we are more sure of the design (similar to how the
- set object was prototyped).
-
- * Should the bytes object implement the buffer interface? Probably,
- but we need to look into the implications of that (e.g. regex
- operations on byte arrays).
+ * Need to specify the methods more carefully.
+
+ * Should all those list methods really be implemented?
+
+ * A case could be made for supporting .ljust(), .rjust(),
+ .center() with a mandatory second argument.
+
+ * A case could be made for supporting .split() with a mandatory
+ argument.
- * Should the object implement __reversed__ and reverse? Should it
- implement sort?
+ * How should pickling and marshalling work?
- * Need to clarify what some of the methods do. How are comparisons
- done? Hashing? Pickling and marshalling?
+ * I probably forgot a few things.
Questions and answers
@@ -174,7 +210,7 @@
Q: Why does bytes ignore the encoding argument if the initialiser is
- a str?
+ a str? (This only applies to 2.6.)
A: There is no sane meaning that the encoding can have in that case.
str objects *are* byte arrays and they know nothing about the
More information about the Python-checkins
mailing list