pickle.load() extremely slow performance

Jim Garrison jhg at acm.org
Sat Mar 21 01:26:22 CET 2009

John Machin wrote:
> On Mar 21, 9:25 am, Jim Garrison <j... at acm.org> wrote:
>> I'm converting a Perl system to Python, and have run into a severe
>> performance problem with pickle.
>> One facet of the system involves scanning and loading into memory a
>> couple of parallel directory trees containing OTO 10^4 files.  The
>> trees don't change during development/testing and the scan takes 30-40
>> seconds, so to save time I cache the loaded tree structure to disk, in
>> Perl with module Storable, and in Python with pickle.
>> In Perl, the save operation produces a file of about 3MB, and both
>> save and restore take a second or two.  In Python, pickle.dump()
>> produces a similar-size file but takes 20 seconds, and pickle.load()
>> takes 45 seconds, which is actually LONGER than the time required to
>> scan the directory trees.
>> Is there anything I can do to speed up pickle.load() to get
>> performance comparable to Perl's Storable?
> Have you read this:
>     http://www.python.org/doc/2.6/library/pickle.html
> ?
> Have you considered using cPickle instead of pickle?
> Have you considered using *ickle.dump(..., protocol=-1) ?

I'm using Python 3 on Windows (Server 2003).  According to the docs

   "The pickle module has an transparent optimizer (_pickle) written
   in C. It is used whenever available. Otherwise the pure Python
   implementation is used."

How can I tell if _pickle is being used?

More information about the Python-list mailing list