PEP: 3154
Title: Pickle protocol version 4
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <solipsis at pitrou.net>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2011-08-11
Python-Version: 3.3
Resolution: TBD


Data serialized using the pickle module must be portable accross Python
versions.  It should also support the latest language features as well
as implementation-specific features.  For this reason, the pickle
module knows about several protocols (currently numbered from 0 to 3),
each of which appeared in a different Python version.  Using a
low-numbered protocol version allows to exchange data with old Python
versions, while using a high-numbered protocol allows access to newer
features and sometimes more efficient resource use (both CPU time
required for (de)serializing, and disk size / network bandwidth
required for data transfer).


The latest current protocol, coincidentally named protocol 3, appeared
with Python 3.0 and supports the new incompatible features in the
language (mainly, unicode strings by default and the new bytes
object).  The opportunity was not taken at the time to improve the
protocol in other ways.

This PEP is an attempt to foster a number of small incremental
improvements in a future new protocol version.  The PEP process is used
in order to gather as many improvements as possible, because the
introduction of a new protocol version should be a rare occurrence.

Improvements in discussion

64-bit compatibility for large objects

Current protocol versions export object sizes for various built-in types
(str, bytes) as 32-bit ints.  This forbids serialization of large data
[1]_. New opcodes are required to support very large bytes and str

Native opcodes for sets and frozensets

Many common built-in types (such as str, bytes, dict, list, tuple) have
dedicated opcodes to improve resource consumption when serializing and
deserializing them; however, sets and frozensets don't.  Adding such
opcodes would be an obvious improvement.  Also, dedicated set support
could help remove the current impossibility of pickling
self-referential sets [2]_.

Binary encoding for all opcodes

The GLOBAL opcode, which is still used in protocol 3, uses the so-called
"text" mode of the pickle protocol, which involves looking for newlines
in the pickle stream.  Looking for newlines is difficult to optimize on
a non-seekable stream, and therefore a new version of GLOBAL
(BINGLOBAL?) could use a binary encoding instead.

It seems that all other opcodes emitted when using protocol 3 already
use binary encoding.




