[Python-Dev] undesireable unpickle behavior, proposed fix
"Martin v. Löwis"
martin at v.loewis.de
Tue Jan 27 19:43:30 CET 2009
> Interning the strings on unpickling makes the pickles smaller, and at
> least for cPickle actually makes unpickling sequences of many objects
> slightly faster. I have included proposed patches to cPickle.c and
> pickle.py, and would appreciate any feedback.
Please submit patches always to the bug tracker.
On the proposed change: While it is fairly unintrusive, I would like to
propose a different approach - pickle interned strings special. The
marshal module already uses this approach, and it should extend to
pickle (although it would probably require a new protocol).
On pickling, inspect each string and check whether it is interned. If
so, emit a different code, and record it into the object id dictionary.
On a second occurrence of the string, only pickle a backward reference.
(Alternatively, check whether pickling the same string a second time
would be more compact).
On unpickling, support the new code to intern the result strings;
subsequent references to it will go to the standard backreferencing
algorithm.
Regards,
Martin
More information about the Python-Dev
mailing list