On Thu, Mar 5, 2009 at 12:07 PM, Collin Winter <collinw@gmail.com> wrote:
I'm working on some performance patches for cPickle, and one of the bigger wins so far has been replacing the Pickler's memo dict with a custom hashtable (and hence removing memo's getters and setters). In looking over this, Jeffrey Yasskin commented that this would break anyone who was accessing the memo attribute.
I've found a few examples of code using the memo attribute ([1], [2], [3]), and there are probably more out there, but the memo attribute doesn't look like part of the API to me. It's only documented in http://docs.python.org/library/pickle.html as "you used to need this before Python 2.3, but don't anymore". However: I don't believe you should ever need this attribute.
The usages of memo I've seen break down into two camps: clearing the memo, and wanting to explicitly populate the memo with predefined values. Clearing the memo is recommended as part of reusing Pickler objects, but I can't fathom when you would want to reuse a Pickler *without* clearing the memo. Reusing the Pickler without clearing the memo will produce pickles that are, as best I can see, invalid -- at least, pickletools.dis() rejects this, which is the closest thing we have to a validator.
I can explain this, as I invented this behavior. The use case was to have a long-lived session between a client and a server which were communicating repeatedly using pickles. The idea was that values that had been transferred once wouldn't have to be sent across the wire again -- they could just be referenced. This was a bad idea (*), and I'd be happy to ban it -- but we'd probably have to bump the pickle protocol version in order to maintain backwards compatibility.
Explicitly setting memo values has the same problem: an easy, very brittle way to produce invalid data.
Agreed this is just preposterous. It was never part of the plan.
So my questions are these: 1) Should Pickler/Unpickler objects automatically clear their memos when dumping/loading?
Alas, there could be backwards compatibility issues. Bumping the protocol could mitigate this.
2) Is memo an intentionally exposed, supported part of the Pickler/Unpickler API, despite the lack of documentation and tests?
The exposition is unintentional but for historic reasons we can't just remove it. :-(
Thanks, Collin
[1] - http://google.com/codesearch/p?hl=en#Qx8E-7HUBTk/trunk/google/appengine/api/memcache/__init__.py&q=lang:py%20%5C.memo [2] - http://google.com/codesearch/p?hl=en#M-DDI-lCOgE/lib/python2.4/site-packages/cvs2svn_lib/primed_pickle.py&q=lang:py%20%5C.memo [3] - http://google.com/codesearch/p?hl=en#l_w_cA4dKMY/AtlasAnalysis/2.0.3-LST-1/PhysicsAnalysis/PyAnalysis/PyAnalysisUtils/python/root_pickle.py&q=lang:py%20pick.*%5C.memo%5Cb
__________ (*) http://code.google.com/p/googleappengine/issues/detail?id=417 -- --Guido van Rossum (home page: http://www.python.org/~guido/)