pickle.load() extremely slow performance

Carl Banks pavlovevidence at gmail.com
Sat Mar 21 03:30:07 CET 2009

On Mar 20, 5:26 pm, Jim Garrison <j... at acm.org> wrote:
> John Machin wrote:
> > On Mar 21, 9:25 am, Jim Garrison <j... at acm.org> wrote:
> >> I'm converting a Perl system to Python, and have run into a severe
> >> performance problem with pickle.
> >> One facet of the system involves scanning and loading into memory a
> >> couple of parallel directory trees containing OTO 10^4 files.  The
> >> trees don't change during development/testing and the scan takes 30-40
> >> seconds, so to save time I cache the loaded tree structure to disk, in
> >> Perl with module Storable, and in Python with pickle.
> >> In Perl, the save operation produces a file of about 3MB, and both
> >> save and restore take a second or two.  In Python, pickle.dump()
> >> produces a similar-size file but takes 20 seconds, and pickle.load()
> >> takes 45 seconds, which is actually LONGER than the time required to
> >> scan the directory trees.
> >> Is there anything I can do to speed up pickle.load() to get
> >> performance comparable to Perl's Storable?
> > Have you read this:
> >    http://www.python.org/doc/2.6/library/pickle.html
> > ?
> > Have you considered using cPickle instead of pickle?
> > Have you considered using *ickle.dump(..., protocol=-1) ?
> I'm using Python 3 on Windows (Server 2003).  According to the docs
>    "The pickle module has an transparent optimizer (_pickle) written
>    in C. It is used whenever available. Otherwise the pure Python
>    implementation is used."
> How can I tell if _pickle is being used?

The slow performance is most likely due to the poor performance of
Python 3's IO, which is caused by (among other things) bad buffering
strategy.  It's a Python 3 growing pain, and is being rewritten.
Python 3.1 should be must faster but it's not been released yet.

As a workaround, mmap the file instead.  For example (untested):

f = open('dirlisting.dat','rb')
    size = f.tell()
    m = mmap.mmap(f.fileno(),size,access=mmap.ACCESS_READ)
        dir_listing = pickle.loads(m)

Pickling the output left as an exercise.

Carl Banks

More information about the Python-list mailing list