Collin Winter wrote:
[...] I've found a few examples of code using the memo attribute ([1], [2], [3]) [...]
As author of [2] (current version here [4]) I can tell you my reason. cvs2svn has to store a vast number of small objects in a database, then read them in random order. I spent a lot of time optimizing this part of the code because it is crucial for the overall performance when converting large CVS repositories. The objects are not all of the same class and sometimes contain other objects, so it is convenient to use pickling instead of, say, marshaling. It is easy to optimize the pickling of instances by giving them __getstate__() and __setstate__() methods. But the pickler still records the type of each object (essentially, the name of its class) in each record. The space for these strings constituted a large fraction of the database size. So I "prime" the picklers/unpicklers by pickling then unpickling a "primer" that contains the classes that I know will appear, and storing the resulting memos once in the database. Then for each record I create a new pickler/unpickler and initialize its memo to the "primer"'s memo before using it to read the actual object. This removes a lot of redundancy across database records. I only prime my picklers/unpicklers with the classes. But note that the same technique could be used for any repeated subcomponents. This would have the added advantage that all of the unpickled instances would share copies of the objects that appear in the primer, which could be a semantic advantage and a significant savings in RAM in addition to the space and processing time advantages described above. It might even be a useful feature to the "shelve" module.
So my questions are these: 1) Should Pickler/Unpickler objects automatically clear their memos when dumping/loading? 2) Is memo an intentionally exposed, supported part of the Pickler/Unpickler API, despite the lack of documentation and tests?
For my application, either of the following alternatives would also suffice: - The ability to pickle picklers and unpicklers themselves (including their memos). This is, of course, awkward because they are hard-wired to files. - Picklers and unpicklers could have get_memo() and set_memo() methods that return an opaque (but pickleable) memo object. In other words, I don't need to muck around inside the memo object; I just need to be able to save and restore it. Please note that the memo for a pickler is not equal to the memo of the corresponding unpickler. A similar effect could *almost* be obtained without accessing the memos by saving the pickled primer itself in the database. The unpickler would be primed by using it to load the primer before loading the record of interest. But AFAIK there is no way to prime new picklers, because there is no guarantee that pickling the same primer twice will result in the same id->object mapping in the pickler's memo. Michael
[2] - http://google.com/codesearch/p?hl=en#M-DDI-lCOgE/lib/python2.4/site-packages/cvs2svn_lib/primed_pickle.py&q=lang:py%20%5C.memo [4] http://cvs2svn.tigris.org/source/browse/cvs2svn/trunk/cvs2svn_lib/serializer...