Strange memo behavior from cPickle

We seem to have stumbled upon some strange behavior in cPickle's memo use when pickling instances. Here's the repro: [mymodule.py] class C: def __getstate__(self): return ('s1', 's2', 's3') [interactive interpreter] Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import cPickle import mymodule class C: ... def __getstate__(self): return ('s1', 's2', 's3') ... for x in mymodule.C(), C(): cPickle.dumps(x) ... "(imymodule\nC\np1\n(S's1'\nS's2'\np2\nS's3'\np3\ntp4\nb." "(i__main__\nC\np1\n(S's1'\nS's2'\nS's3'\ntp2\nb."
Note that the second and third strings in the instance's state are memoized in the first case, but not in the second. Any idea why this occurs (and why the first element is never memoized)? --Bruce

[Bruce Christensen]
We seem to have stumbled upon some strange behavior in cPickle's memo use when pickling instances.
Here's the repro:
[mymodule.py] class C: def __getstate__(self): return ('s1', 's2', 's3')
[interactive interpreter] Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information.
import cPickle import mymodule class C: ... def __getstate__(self): return ('s1', 's2', 's3') ... for x in mymodule.C(), C(): cPickle.dumps(x) ... "(imymodule\nC\np1\n(S's1'\nS's2'\np2\nS's3'\np3\ntp4\nb." "(i__main__\nC\np1\n(S's1'\nS's2'\nS's3'\ntp2\nb."
Note that the second and third strings in the instance's state are memoized in the first case, but not in the second. Any idea why this occurs (and why the first element is never memoized)?
Ideally, a pickle would never contain a `PUT i` unless i was referenced by a `GET i` later. So, ideally, there would be no PUT opcodes in either of these pickles. cPickle is a little bit smarter than pickle.py here, in that cPickle suppresses a PUT if the reference count on the object is less than 2 (in which case the structure being pickled can't possibly reference the sub-object a second time, so it's impossible that a later GET will want to reference the same sub-object). So all you're seeing here is refcount accidents, complicated by accidents concerning exactly which strings get interned. Use pickle.py instead (which doesn't do this refcount micro-optimization), and you'll see the same number of PUTs in both. They're all correct. What would be incorrect is seeing a `GET i` without a preceding `PUT i` using the same `i`.
participants (2)
-
Bruce Christensen
-
Tim Peters