[Python-Dev] Strange memo behavior from cPickle

Tim Peters tim.peters at gmail.com
Wed Aug 2 02:40:13 CEST 2006


[Bruce Christensen]
> We seem to have stumbled upon some strange behavior in cPickle's memo
> use when pickling instances.
>
> Here's the repro:
>
> [mymodule.py]
> class C:
>     def __getstate__(self): return ('s1', 's2', 's3')
>
> [interactive interpreter]
> Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cPickle
> >>> import mymodule
> >>> class C:
> ...     def __getstate__(self): return ('s1', 's2', 's3')
> ...
> >>> for x in mymodule.C(), C(): cPickle.dumps(x)
> ...
> "(imymodule\nC\np1\n(S's1'\nS's2'\np2\nS's3'\np3\ntp4\nb."
> "(i__main__\nC\np1\n(S's1'\nS's2'\nS's3'\ntp2\nb."
> >>>
>
> Note that the second and third strings in the instance's state are
> memoized in the first case, but not in the second. Any idea why this
> occurs (and why the first element is never memoized)?

Ideally, a pickle would never contain a `PUT i` unless i was
referenced by a `GET i` later.  So, ideally, there would be no PUT
opcodes in either of these pickles.

cPickle is a little bit smarter than pickle.py here, in that cPickle
suppresses a PUT if the reference count on the object is less than 2
(in which case the structure being pickled can't possibly reference
the sub-object a second time, so it's impossible that a later GET will
want to reference the same sub-object).  So all you're seeing here is
refcount accidents, complicated by accidents concerning exactly which
strings get interned.

Use pickle.py instead (which doesn't do this refcount
micro-optimization), and you'll see the same number of PUTs in both.

They're all correct.  What would be incorrect is seeing a `GET i`
without a preceding `PUT i`   using the same `i`.


More information about the Python-Dev mailing list