Le vendredi 06 mars 2009 à 13:44 +0100, Michael Haggerty a écrit :
Antoine Pitrou wrote:
Michael Haggerty <mhagger <at> alum.mit.edu> writes:
It is easy to optimize the pickling of instances by giving them __getstate__() and __setstate__() methods. But the pickler still records the type of each object (essentially, the name of its class) in each record. The space for these strings constituted a large fraction of the database size.
If these strings are not interned, then perhaps they should be. There is a similar optimization proposal (w/ patch) for attribute names: http://bugs.python.org/issue5084
If I understand correctly, this would not help:
- on writing, the strings are identical anyway, because they are read out of the class's __name__ and __module__ fields. Therefore the Pickler's usual memoizing behavior will prevent the strings from being written more than once.
Then why did you say that "the space for these strings constituted a large fraction of the database size", if they are already shared? Are your objects so tiny that even the space taken by the pointer to the type name grows the size of the database significantly?