PEP 3154 - pickle protocol 4

Hello, This PEP is an attempt to foster a number of small incremental improvements in a future pickle protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence. Feel free to suggest any additions. Regards Antoine. http://www.python.org/dev/peps/pep-3154/ PEP: 3154 Title: Pickle protocol version 4 Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou <solipsis@pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2011-08-11 Python-Version: 3.3 Post-History: Resolution: TBD Abstract ======== Data serialized using the pickle module must be portable accross Python versions. It should also support the latest language features as well as implementation-specific features. For this reason, the pickle module knows about several protocols (currently numbered from 0 to 3), each of which appeared in a different Python version. Using a low-numbered protocol version allows to exchange data with old Python versions, while using a high-numbered protocol allows access to newer features and sometimes more efficient resource use (both CPU time required for (de)serializing, and disk size / network bandwidth required for data transfer). Rationale ========= The latest current protocol, coincidentally named protocol 3, appeared with Python 3.0 and supports the new incompatible features in the language (mainly, unicode strings by default and the new bytes object). The opportunity was not taken at the time to improve the protocol in other ways. This PEP is an attempt to foster a number of small incremental improvements in a future new protocol version. The PEP process is used in order to gather as many improvements as possible, because the introduction of a new protocol version should be a rare occurrence. Improvements in discussion ========================== 64-bit compatibility for large objects -------------------------------------- Current protocol versions export object sizes for various built-in types (str, bytes) as 32-bit ints. This forbids serialization of large data [1]_. New opcodes are required to support very large bytes and str objects. Native opcodes for sets and frozensets -------------------------------------- Many common built-in types (such as str, bytes, dict, list, tuple) have dedicated opcodes to improve resource consumption when serializing and deserializing them; however, sets and frozensets don't. Adding such opcodes would be an obvious improvement. Also, dedicated set support could help remove the current impossibility of pickling self-referential sets [2]_. Binary encoding for all opcodes ------------------------------- The GLOBAL opcode, which is still used in protocol 3, uses the so-called "text" mode of the pickle protocol, which involves looking for newlines in the pickle stream. Looking for newlines is difficult to optimize on a non-seekable stream, and therefore a new version of GLOBAL (BINGLOBAL?) could use a binary encoding instead. It seems that all other opcodes emitted when using protocol 3 already use binary encoding. Acknowledgments =============== (...) References ========== .. [1] "pickle not 64-bit ready": http://bugs.python.org/issue11564 .. [2] "Cannot pickle self-referencing sets": http://bugs.python.org/issue9269 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

On 2011-08-12, at 12:58 , Antoine Pitrou wrote:
Is there really no possibility of fix recursive pickling once and for all? Dedicated optcodes for resource consumption purposes (and to match those of other build-in types) is still a good idea, but being able to pickle arbitrary recursive structures would be even better would it not? And if specific (new) opcodes are required to handle recursive pickling correctly, that's the occasion.

Hello, Le vendredi 12 août 2011 à 14:32 +0200, Xavier Morel a écrit :
Yes, and also the old opcodes must still be supported, so there's no maintenance gain in not exploiting them.
The opcode space is not full enough to justify this kind of complication, IMO.
That's true. Actually, it seems pickling recursive sets could have worked from the start, if a difference __reduce__ had been chosen and a __setstate__ had been defined:
# m has a reference loop
[x for x in m if getattr(x, 'm', None) is m] [<__main__.X object at 0x7fe3635c6990>]
# mm retains a similar reference loop
[x for x in mm if getattr(x, 'm', None) is mm] [<__main__.X object at 0x7fe3635c6c30>]
# the representation is roughly as efficient as the original one
We can't change set.__reduce__ (or __reduce_ex__) without a protocol bump, though, since past Pythons would fail loading the pickles. Regards Antoine.

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thanks. this sounds like a good idea. That's not to say that I have already approved the PEP. :-) But from skimming it I have no objections except that it needs to be fleshed out. -- --Guido van Rossum (python.org/~guido)

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Your propositions sound all good to me. We will need to agree about the details, but I believe these improvements to the current protocol will be appreciated. Also, one thing keeps coming back is the need for pickling functions and methods which are not part of the global namespace (e.g. issue 9276<http://bugs.python.org/issue9276>). Support for this would likely help us fixing another related namespace issue (i.e., issue 3657 <http://bugs.python.org/issue3657%C2%A0>). Finally, we currently missing support for pickling classes with __new__ taking keyword-only arguments (i.e. issue 4727 <http://bugs.python.org/issue4727>). -- Alexandre

On Tue, Aug 16, 2011 at 5:56 AM, Alexandre Vassalotti <alexandre@peadrop.com> wrote:
In the spirit of PEP 395 and python 3's pickle._compat_pickle, perhaps it would be worth looking at a mechanism whereby a pickle could specify "alternate class names" for included class instances in the pickle itself? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2011-08-12, at 12:58 , Antoine Pitrou wrote:
Is there really no possibility of fix recursive pickling once and for all? Dedicated optcodes for resource consumption purposes (and to match those of other build-in types) is still a good idea, but being able to pickle arbitrary recursive structures would be even better would it not? And if specific (new) opcodes are required to handle recursive pickling correctly, that's the occasion.

Hello, Le vendredi 12 août 2011 à 14:32 +0200, Xavier Morel a écrit :
Yes, and also the old opcodes must still be supported, so there's no maintenance gain in not exploiting them.
The opcode space is not full enough to justify this kind of complication, IMO.
That's true. Actually, it seems pickling recursive sets could have worked from the start, if a difference __reduce__ had been chosen and a __setstate__ had been defined:
# m has a reference loop
[x for x in m if getattr(x, 'm', None) is m] [<__main__.X object at 0x7fe3635c6990>]
# mm retains a similar reference loop
[x for x in mm if getattr(x, 'm', None) is mm] [<__main__.X object at 0x7fe3635c6c30>]
# the representation is roughly as efficient as the original one
We can't change set.__reduce__ (or __reduce_ex__) without a protocol bump, though, since past Pythons would fail loading the pickles. Regards Antoine.

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thanks. this sounds like a good idea. That's not to say that I have already approved the PEP. :-) But from skimming it I have no objections except that it needs to be fleshed out. -- --Guido van Rossum (python.org/~guido)

On Fri, Aug 12, 2011 at 3:58 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Your propositions sound all good to me. We will need to agree about the details, but I believe these improvements to the current protocol will be appreciated. Also, one thing keeps coming back is the need for pickling functions and methods which are not part of the global namespace (e.g. issue 9276<http://bugs.python.org/issue9276>). Support for this would likely help us fixing another related namespace issue (i.e., issue 3657 <http://bugs.python.org/issue3657%C2%A0>). Finally, we currently missing support for pickling classes with __new__ taking keyword-only arguments (i.e. issue 4727 <http://bugs.python.org/issue4727>). -- Alexandre

On Tue, Aug 16, 2011 at 5:56 AM, Alexandre Vassalotti <alexandre@peadrop.com> wrote:
In the spirit of PEP 395 and python 3's pickle._compat_pickle, perhaps it would be worth looking at a mechanism whereby a pickle could specify "alternate class names" for included class instances in the pickle itself? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (5)
-
Alexandre Vassalotti
-
Antoine Pitrou
-
Guido van Rossum
-
Nick Coghlan
-
Xavier Morel