[Python-Dev] Pickler/Unpickler API clarification

Michael Haggerty mhagger at alum.mit.edu
Fri Mar 6 13:44:57 CET 2009


Antoine Pitrou wrote:
> Michael Haggerty <mhagger <at> alum.mit.edu> writes:
>> It is easy to optimize the pickling of instances by giving them
>> __getstate__() and __setstate__() methods.  But the pickler still
>> records the type of each object (essentially, the name of its class) in
>> each record.  The space for these strings constituted a large fraction
>> of the database size.
> 
> If these strings are not interned, then perhaps they should be.
> There is a similar optimization proposal (w/ patch) for attribute names:
> http://bugs.python.org/issue5084

If I understand correctly, this would not help:

- on writing, the strings are identical anyway, because they are read
out of the class's __name__ and __module__ fields.  Therefore the
Pickler's usual memoizing behavior will prevent the strings from being
written more than once.

- on reading, the strings are only used to look up the class.  Therefore
they are garbage collected almost immediately.

This is a different situation that that of attribute names, which are
stored persistently as the keys in the instance's __dict__.

Michael


More information about the Python-Dev mailing list