[Python-Dev] Strange memo behavior from cPickle
tim.peters at gmail.com
Wed Aug 2 02:40:13 CEST 2006
> We seem to have stumbled upon some strange behavior in cPickle's memo
> use when pickling instances.
> Here's the repro:
> class C:
> def __getstate__(self): return ('s1', 's2', 's3')
> [interactive interpreter]
> Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import cPickle
> >>> import mymodule
> >>> class C:
> ... def __getstate__(self): return ('s1', 's2', 's3')
> >>> for x in mymodule.C(), C(): cPickle.dumps(x)
> Note that the second and third strings in the instance's state are
> memoized in the first case, but not in the second. Any idea why this
> occurs (and why the first element is never memoized)?
Ideally, a pickle would never contain a `PUT i` unless i was
referenced by a `GET i` later. So, ideally, there would be no PUT
opcodes in either of these pickles.
cPickle is a little bit smarter than pickle.py here, in that cPickle
suppresses a PUT if the reference count on the object is less than 2
(in which case the structure being pickled can't possibly reference
the sub-object a second time, so it's impossible that a later GET will
want to reference the same sub-object). So all you're seeing here is
refcount accidents, complicated by accidents concerning exactly which
strings get interned.
Use pickle.py instead (which doesn't do this refcount
micro-optimization), and you'll see the same number of PUTs in both.
They're all correct. What would be incorrect is seeing a `GET i`
without a preceding `PUT i` using the same `i`.
More information about the Python-Dev