[Python-checkins] peps: Pickle 4 changes:
antoine.pitrou
python-checkins at python.org
Fri Apr 26 22:57:15 CEST 2013
http://hg.python.org/peps/rev/6086bd69599e
changeset: 4858:6086bd69599e
user: Antoine Pitrou <solipsis at pitrou.net>
date: Fri Apr 26 22:57:06 2013 +0200
summary:
Pickle 4 changes:
- add framing
- change BINGLOBAL to Alexandre Vassalotti's GLOBAL_STACK
files:
pep-3154.txt | 91 ++++++++++++++++++++++++++-------------
1 files changed, 60 insertions(+), 31 deletions(-)
diff --git a/pep-3154.txt b/pep-3154.txt
--- a/pep-3154.txt
+++ b/pep-3154.txt
@@ -42,11 +42,67 @@
introduction of a new protocol version should be a rare occurrence.
-Improvements in discussion
-==========================
+Proposed changes
+================
-64-bit compatibility for large objects
---------------------------------------
+Framing
+-------
+
+Traditionally, when unpickling an object from a stream (by calling
+``load()`` rather than ``loads()``), many small ``read()``
+calls can be issued on the file-like object, with a potentially huge
+performance impact.
+
+Protocol 4, by contrast, features binary framing. The general structure
+of a pickle is thus the following::
+
+ +------+------+
+ | 0x80 | 0x03 | protocol header (2 bytes)
+ +------+------+-----------+
+ | AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
+ +------+------------------+
+ | .... | first frame contents (N bytes)
+ +------+------+-----------+
+ | AA BB CC DD EE FF GG HH | frame size (8 bytes, little-endian)
+ +------+------------------+
+ | .... | second frame contents (N bytes)
+ +------+
+ etc.
+
+To keep the implementation simple, it is forbidden for a pickle opcode
+to overlap frame boundaries. The pickler takes care not to produce such
+pickles, and the unpickler refuses them.
+
+How the pickler decides frame sizes is an implementation detail.
+A simple heuristic committing the current frame as soon as it reaches
+64 KiB seems sufficient.
+
+Binary encoding for all opcodes
+-------------------------------
+
+The GLOBAL opcode, which is still used in protocol 3, uses the
+so-called "text" mode of the pickle protocol, which involves looking
+for newlines in the pickle stream. It also complicates the implementation
+of binary framing.
+
+Protocol 4 forbids use of the GLOBAL opcode and replaces it with
+GLOBAL_STACK, a new opcode which takes its operand from the stack.
+
+Serializing more "lookupable" objects
+-------------------------------------
+
+By default, pickle is only able to serialize module-global functions and
+classes. Supporting other kinds of objects, such as unbound methods [4]_,
+is a common request. Actually, third-party support for some of them, such
+as bound methods, is implemented in the multiprocessing module [5]_.
+
+The ``__qualname__`` attribute from :pep:`3155` makes it possible to
+lookup many more objects by name. Making the GLOBAL_STACK opcode accept
+dot-separated names, or adding a special GETATTR opcode, would allow the
+standard pickle implementation to support all those kinds of objects.
+
+64-bit opcodes for large objects
+--------------------------------
Current protocol versions export object sizes for various built-in
types (str, bytes) as 32-bit ints. This forbids serialization of
@@ -71,33 +127,6 @@
special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
are needed.
-Serializing more "lookupable" objects
--------------------------------------
-
-For some kinds of objects, it only makes sense to serialize them by name
-(for example classes and functions). By default, pickle is only able to
-serialize module-global functions and classes by name. Supporting other
-kinds of objects, such as unbound methods [4]_, is a common request.
-Actually, third-party support for some of them, such as bound methods,
-is implemented in the multiprocessing module [5]_.
-
-:pep:`3155` now makes it possible to lookup many more objects by name.
-Generalizing the GLOBAL opcode to accept dot-separated names, or adding
-a special GETATTR opcode, would allow the standard pickle implementation
-to support, in an efficient way, all those kinds of objects.
-
-Binary encoding for all opcodes
--------------------------------
-
-The GLOBAL opcode, which is still used in protocol 3, uses the
-so-called "text" mode of the pickle protocol, which involves looking
-for newlines in the pickle stream. Looking for newlines is difficult
-to optimize on a non-seekable stream, and therefore a new version of
-GLOBAL (BINGLOBAL?) could use a binary encoding instead.
-
-It seems that all other opcodes emitted when using protocol 3 already
-use binary encoding.
-
Better string encoding
----------------------
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list