[Python-checkins] r53882 - peps/trunk/pep-0000.txt peps/trunk/pep-0358.txt peps/trunk/pep-3112.txt
guido.van.rossum
python-checkins at python.org
Sat Feb 24 06:42:55 CET 2007
Author: guido.van.rossum
Date: Sat Feb 24 06:42:52 2007
New Revision: 53882
Added:
peps/trunk/pep-3112.txt (contents, props changed)
Modified:
peps/trunk/pep-0000.txt
peps/trunk/pep-0358.txt
Log:
Add the bytes literal PEP, by Jason Orendorff.
And accept it!
Modified: peps/trunk/pep-0000.txt
==============================================================================
--- peps/trunk/pep-0000.txt (original)
+++ peps/trunk/pep-0000.txt Sat Feb 24 06:42:52 2007
@@ -83,6 +83,7 @@
SA 3109 Raising Exceptions in Python 3000 Winter
SA 3110 Catching Exceptions in Python 3000 Winter
SA 3111 Simple input built-in in Python 3000 Roberge
+ SA 3112 Bytes literals in Python 3000 Wouters
Open PEPs (under consideration)
@@ -455,6 +456,7 @@
SA 3109 Raising Exceptions in Python 3000 Winter
SA 3110 Catching Exceptions in Python 3000 Winter
SA 3111 Simple input built-in in Python 3000 Roberge
+ SA 3112 Bytes literals in Python 3000 Wouters
Key
Modified: peps/trunk/pep-0358.txt
==============================================================================
--- peps/trunk/pep-0358.txt (original)
+++ peps/trunk/pep-0358.txt Sat Feb 24 06:42:52 2007
@@ -181,6 +181,9 @@
be added to language to allow objects to be converted into byte
arrays. This decision is out of scope.
+ * A bytes literal of the form b"..." is also proposed. This is
+ the subject of PEP 3112.
+
Open Issues
@@ -195,14 +198,7 @@
* Should all those list methods really be implemented?
- * There is growing support for a b"..." literal. Here's a brief
- spec. Each invocation of b"..." produces a new bytes object
- (this is unlike "..." but similar to [...] and {...}). Inside
- the literal, only ASCII characters and non-Unicode backslash
- escapes are allowed; non-ASCII characters not specified as
- escapes are rejected by the compiler regardless of the source
- encoding. The resulting object's value is the same as if
- bytes(map(ord, "...")) were called.
+ * Now that a b"..." literal exists, shouldn't repr() return one?
* A case could be made for supporting .ljust(), .rjust(),
.center() with a mandatory second argument.
Added: peps/trunk/pep-3112.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-3112.txt Sat Feb 24 06:42:52 2007
@@ -0,0 +1,159 @@
+PEP:
+Title: Bytes literals in Python 3000
+Version: $Revision$
+Last-Modified: $Date$
+Author: Jason Orendorff <jason.orendorff at gmail.com>
+Status: Accepted
+Type:
+Content-Type: text/x-rst
+Requires: 358
+Created: 23-Feb-2007
+Python-Version: 3.0
+Post-History: 23-Feb-2007
+
+
+Abstract
+========
+
+This PEP proposes a literal syntax for the ``bytes`` objects
+introduced in PEP 358. The purpose is to provide a convenient way to
+spell ASCII strings and arbitrary binary data.
+
+
+Motivation
+==========
+
+Existing spellings of an ASCII string in Python 3000 include:
+
+ bytes('Hello world', 'ascii')
+ 'Hello world'.encode('ascii')
+
+The proposed syntax is:
+
+ b'Hello world'
+
+Existing spellings of an 8-bit binary sequence in Python 3000 include:
+
+ bytes([0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00])
+ bytes('\x7fELF\x01\x01\x01\0', 'latin-1')
+ '7f454c4601010100'.decode('hex')
+
+The proposed syntax is:
+
+ b'\x7f\x45\x4c\x46\x01\x01\x01\x00'
+ b'\x7fELF\x01\x01\x01\0'
+
+In both cases, the advantages of the new syntax are brevity, some
+small efficiency gain, and the detection of encoding errors at compile
+time rather than at runtime. The brevity benefit is especially felt
+when using the string-like methods of bytes objects:
+
+ lines = bdata.split(bytes('\n', 'ascii')) # existing syntax
+ lines = bdata.split(b'\n') # proposed syntax
+
+And when converting code from Python 2.x to Python 3000:
+
+ sok.send('EXIT\r\n') # Python 2.x
+ sok.send('EXIT\r\n'.encode('ascii')) # Python 3000 existing
+ sok.send(b'EXIT\r\n') # proposed
+
+
+Grammar Changes
+===============
+
+The proposed syntax is an extension of the existing string
+syntax. [#stringliterals]_
+
+The new syntax for strings, including the new bytes literal, is:
+
+ stringliteral: [stringprefix] (shortstring | longstring)
+ stringprefix: "b" | "r" | "br" | "B" | "R" | "BR" | "Br" | "bR"
+ shortstring: "'" shortstringitem* "'" | '"' shortstringitem* '"'
+ longstring: "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
+ shortstringitem: shortstringchar | escapeseq
+ longstringitem: longstringchar | escapeseq
+ shortstringchar:
+ <any source character except "\" or newline or the quote>
+ longstringchar: <any source character except "\">
+ escapeseq: "\" NL
+ | "\\" | "\'" | '\"'
+ | "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"
+ | "\ooo" | "\xhh"
+ | "\uxxxx" | "\Uxxxxxxxx" | "\N{name}"
+
+The following additional restrictions apply only to bytes literals
+(``stringliteral`` tokens with ``b`` or ``B`` in the
+``stringprefix``):
+
+- Each ``shortstringchar`` or ``longstringchar`` must be a character
+ between 1 and 127 inclusive, regardless of any encoding
+ declaration [#encodings]_ in the source file.
+
+- The Unicode-specific escape sequences ``\u``*xxxx*,
+ ``\U``*xxxxxxxx*, and ``\N{``*name*``}`` are unrecognized in
+ Python 2.x and forbidden in Python 3000.
+
+Adjacent bytes literals are subject to the same concatenation rules as
+adjacent string literals. [#concat]_ A bytes literal adjacent to a
+string literal is an error.
+
+
+Semantics
+=========
+
+Each evaluation of a bytes literal produces a new ``bytes`` object.
+The bytes in the new object are the bytes represented by the
+``shortstringitem``s or ``longstringitem``s in the literal, in the
+same order.
+
+
+Rationale
+=========
+
+The proposed syntax provides a cleaner migration path from Python 2.x
+to Python 3000 for most code involving 8-bit strings. Preserving the
+old 8-bit meaning of a string literal is usually as simple as adding a
+``b`` prefix. The one exception is Python 2.x strings containing
+bytes >127, which must be rewritten using escape sequences.
+Transcoding a source file from one encoding to another, and fixing up
+the encoding declaration, should preserve the meaning of the program.
+Python 2.x non-Unicode strings violate this principle; Python 3000
+bytes literals shouldn't.
+
+A string literal with a ``b`` in the prefix is always a syntax error
+in Python 2.5, so this syntax can be introduced in Python 2.6, along
+with the ``bytes`` type.
+
+A bytes literal produces a new object each time it is evaluated, like
+list displays and unlike string literals. This is necessary because
+bytes literals, like lists and unlike strings, are
+mutable. [#eachnew]_
+
+
+Reference Implementation
+========================
+
+Thomas Wouters has checked an implementation into the Py3K branch,
+r53872.
+
+
+References
+==========
+
+.. [#stringliterals]
+ http://www.python.org/doc/current/ref/strings.html
+
+.. [#encodings]
+ http://www.python.org/doc/current/ref/encodings.html
+
+.. [#concat]
+ http://www.python.org/doc/current/ref/string-catenation.html
+
+.. [#eachnew]
+ http://mail.python.org/pipermail/python-3000/2007-February/005779.html
+
+
+Copyright
+=========
+
+This document has been placed in the public domain.
More information about the Python-checkins
mailing list