[Python-Dev] PEP 3154 - pickle protocol 4

Antoine Pitrou solipsis at pitrou.net
Fri Aug 12 15:30:09 CEST 2011


Hello,

Le vendredi 12 août 2011 à 14:32 +0200, Xavier Morel a écrit :
> On 2011-08-12, at 12:58 , Antoine Pitrou wrote:
> > Current protocol versions export object sizes for various built-in types
> > (str, bytes) as 32-bit ints.  This forbids serialization of large data
> > [1]_. New opcodes are required to support very large bytes and str
> > objects.
> How about changing object sizes to be 64b always? Too much overhead for the
> common case (which might be smaller pickled objects)?

Yes, and also the old opcodes must still be supported, so there's no
maintenance gain in not exploiting them.

> Or a slightly more
> devious scheme (e.g. tag-bit, untagged is 31b size, tagged is 63), which
> would not require adding opcodes for that?

The opcode space is not full enough to justify this kind of
complication, IMO.

> > Also, dedicated set support
> > could help remove the current impossibility of pickling
> > self-referential sets [2]_.
> 
> Is there really no possibility of fix recursive pickling once
> and for all? Dedicated optcodes for resource consumption
> purposes (and to match those of other build-in types) is
> still a good idea, but being able to pickle arbitrary
> recursive structures would be even better would it not?

That's true. Actually, it seems pickling recursive sets could have
worked from the start, if a difference __reduce__ had been chosen and a
__setstate__ had been defined:

>>> class X: pass
... 
>>> class myset(set):
...    def __reduce__(self):
...        return (self.__class__, (), list(self))
...    def __setstate__(self, state):
...        self.update(state)
>>> m = myset((1,2,3))
>>> x = X()
>>> x.m = m
>>> m.add(x)
>>> mm = pickle.loads(pickle.dumps(m))
>>> m
myset({1, 2, 3, <__main__.X object at 0x7fe3635c6990>})
>>> mm
myset({1, 2, 3, <__main__.X object at 0x7fe3635c6c30>})

  # m has a reference loop

>>> [x for x in m if getattr(x, 'm', None) is m]
[<__main__.X object at 0x7fe3635c6990>]

  # mm retains a similar reference loop

>>> [x for x in mm if getattr(x, 'm', None) is mm]
[<__main__.X object at 0x7fe3635c6c30>]

  # the representation is roughly as efficient as the original one

>>> len(pickle.dumps(set([1,2,3])))
36
>>> len(pickle.dumps(myset([1,2,3])))
37


We can't change set.__reduce__ (or __reduce_ex__) without a protocol
bump, though, since past Pythons would fail loading the pickles.

Regards

Antoine.




More information about the Python-Dev mailing list