[Python-Dev] Pickler/Unpickler API clarification
Michael Haggerty
mhagger at alum.mit.edu
Fri Mar 6 13:44:57 CET 2009
Antoine Pitrou wrote:
> Michael Haggerty <mhagger <at> alum.mit.edu> writes:
>> It is easy to optimize the pickling of instances by giving them
>> __getstate__() and __setstate__() methods. But the pickler still
>> records the type of each object (essentially, the name of its class) in
>> each record. The space for these strings constituted a large fraction
>> of the database size.
>
> If these strings are not interned, then perhaps they should be.
> There is a similar optimization proposal (w/ patch) for attribute names:
> http://bugs.python.org/issue5084
If I understand correctly, this would not help:
- on writing, the strings are identical anyway, because they are read
out of the class's __name__ and __module__ fields. Therefore the
Pickler's usual memoizing behavior will prevent the strings from being
written more than once.
- on reading, the strings are only used to look up the class. Therefore
they are garbage collected almost immediately.
This is a different situation that that of attribute names, which are
stored persistently as the keys in the instance's __dict__.
Michael
More information about the Python-Dev
mailing list