[Python-Dev] Unpickling memory usage problem, and a proposed solution
Dan Gindikin
dgindikin at gmail.com
Fri Apr 23 20:11:53 CEST 2010
We were having performance problems unpickling a large pickle file, we were
getting 170s running time (which was fine), but 1100mb memory usage. Memory
usage ought to have been about 300mb, this was happening because of memory
fragmentation, due to many unnecessary "puts" in the pickle stream.
We made a pickletools.optimize inspired tool that could run directly on a
pickle file and used pickletools.genops. This solved the unpickling problem
(84s, 382mb).
However the tool itself was using too much memory and time (1100s, 470mb), so
I recoded it to scan through the pickle stream directly, without going through
pickletools.genops, giving (240s, 130mb).
Other people that deal with large pickle files are probably having similar
problems, and since this comes up when dealing with large data it is precisely
in this situation that you probably can't use pickletools.optimize or
pickletools.genops. It feels like functionality that ought to be added to
pickletools, is there some way I can contribute this?
More information about the Python-Dev
mailing list