[Python-checkins] peps: Pickle 4 changes:

antoine.pitrou python-checkins at python.org
Fri Apr 26 22:57:15 CEST 2013


http://hg.python.org/peps/rev/6086bd69599e
changeset:   4858:6086bd69599e
user:        Antoine Pitrou <solipsis at pitrou.net>
date:        Fri Apr 26 22:57:06 2013 +0200
summary:
  Pickle 4 changes:
- add framing
- change BINGLOBAL to Alexandre Vassalotti's GLOBAL_STACK

files:
  pep-3154.txt |  91 ++++++++++++++++++++++++++-------------
  1 files changed, 60 insertions(+), 31 deletions(-)


diff --git a/pep-3154.txt b/pep-3154.txt
--- a/pep-3154.txt
+++ b/pep-3154.txt
@@ -42,11 +42,67 @@
 introduction of a new protocol version should be a rare occurrence.
 
 
-Improvements in discussion
-==========================
+Proposed changes
+================
 
-64-bit compatibility for large objects
---------------------------------------
+Framing
+-------
+
+Traditionally, when unpickling an object from a stream (by calling
+``load()`` rather than ``loads()``), many small ``read()``
+calls can be issued on the file-like object, with a potentially huge
+performance impact.
+
+Protocol 4, by contrast, features binary framing.  The general structure
+of a pickle is thus the following::
+
+    +------+------+
+    | 0x80 | 0x03 |  protocol header (2 bytes)
+    +------+------+-----------+
+    | AA BB CC DD EE FF GG HH |  frame size (8 bytes, little-endian)
+    +------+------------------+
+    | .... |  first frame contents (N bytes)
+    +------+------+-----------+
+    | AA BB CC DD EE FF GG HH |  frame size (8 bytes, little-endian)
+    +------+------------------+
+    | .... |  second frame contents (N bytes)
+    +------+
+      etc.
+
+To keep the implementation simple, it is forbidden for a pickle opcode
+to overlap frame boundaries.  The pickler takes care not to produce such
+pickles, and the unpickler refuses them.
+
+How the pickler decides frame sizes is an implementation detail.
+A simple heuristic committing the current frame as soon as it reaches
+64 KiB seems sufficient.
+
+Binary encoding for all opcodes
+-------------------------------
+
+The GLOBAL opcode, which is still used in protocol 3, uses the
+so-called "text" mode of the pickle protocol, which involves looking
+for newlines in the pickle stream.  It also complicates the implementation
+of binary framing.
+
+Protocol 4 forbids use of the GLOBAL opcode and replaces it with
+GLOBAL_STACK, a new opcode which takes its operand from the stack.
+
+Serializing more "lookupable" objects
+-------------------------------------
+
+By default, pickle is only able to serialize module-global functions and
+classes.  Supporting other kinds of objects, such as unbound methods [4]_,
+is a common request. Actually, third-party support for some of them, such
+as bound methods, is implemented in the multiprocessing module [5]_.
+
+The ``__qualname__`` attribute from :pep:`3155` makes it possible to
+lookup many more objects by name.  Making the GLOBAL_STACK opcode accept
+dot-separated names, or adding a special GETATTR opcode, would allow the
+standard pickle implementation to support all those kinds of objects.
+
+64-bit opcodes for large objects
+--------------------------------
 
 Current protocol versions export object sizes for various built-in
 types (str, bytes) as 32-bit ints.  This forbids serialization of
@@ -71,33 +127,6 @@
 special method (``__getnewargs_ex__`` ?) and a new opcode (NEWOBJEX ?)
 are needed.
 
-Serializing more "lookupable" objects
--------------------------------------
-
-For some kinds of objects, it only makes sense to serialize them by name
-(for example classes and functions).  By default, pickle is only able to
-serialize module-global functions and classes by name.  Supporting other
-kinds of objects, such as unbound methods [4]_, is a common request.
-Actually, third-party support for some of them, such as bound methods,
-is implemented in the multiprocessing module [5]_.
-
-:pep:`3155` now makes it possible to lookup many more objects by name.
-Generalizing the GLOBAL opcode to accept dot-separated names, or adding
-a special GETATTR opcode, would allow the standard pickle implementation
-to support, in an efficient way, all those kinds of objects.
-
-Binary encoding for all opcodes
--------------------------------
-
-The GLOBAL opcode, which is still used in protocol 3, uses the
-so-called "text" mode of the pickle protocol, which involves looking
-for newlines in the pickle stream.  Looking for newlines is difficult
-to optimize on a non-seekable stream, and therefore a new version of
-GLOBAL (BINGLOBAL?)  could use a binary encoding instead.
-
-It seems that all other opcodes emitted when using protocol 3 already
-use binary encoding.
-
 Better string encoding
 ----------------------
 

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list