pickle performance on larger objects

Sam Penrose spenrose at intersight.com
Wed Jul 17 17:09:07 EDT 2002


On a recent project we decided to use pickle for some quick-and-dirty
object persistence. The object in question is a list of 3,000 
dictionaries
whose keys and values are short (< 100 character) strings--about 1.5
megs worth of character data in total. Loading this object from a pickle
using cPickle took so long we assumed something was broken.

In fact, loading is just slow. A list of 10,000 identical dictionaries
whose keys and values are short strings takes many seconds to load on
modern hardware. Some details:
     i.  A python process which is loading a pickle will use a lot of RAM
         relative to the pickle's size on disk, roughly an order of
         magnitude more on Mac OS X.
     ii. Performance appears to scale linearly with changes in the size of
         the list or its dicts until you run out of RAM.
     iii.Python pickle is only about 5x slower than cPickle as the list
         gets long, except that it uses more RAM and therefore hits a big
         RAM-to-diskswap performance falloff sooner.
     iv. You *can* tell a Mac's performance by its MHz. An 800 MHz PIII
         running Windows is almost exactly twice as fast as a 400 MHz G4
         running Mac OS X, both executing the following code from the
         command line. With 25 items in the dictionaries and 10K dicts
         used, the former took just under a minute using cPickle, the
         latter two minutes.
     v.  Generating a list of 3K heterogeneous dicts of 25 items (our real
         data) by reading in a 750k text file and splitting it up takes on
         the order of a second.

Sample run on 400 MHz G4, 448 megs of RAM:

 >>> time_cPickle_Load()
dumping list of 10 dicts:
dumped: 0.00518298149109
loading list of 10 dicts:
loaded: 0.1170129776
dumping list of 100 dicts:
dumped: 0.0329120159149
loading list of 100 dicts:
loaded: 0.849031090736
dumping list of 1000 dicts:
dumped: 0.397919893265
loading list of 1000 dicts:
loaded: 8.18722295761
dumping list of 10000 dicts:
dumped: 4.42434895039
loading list of 10000 dicts:
loaded: 133.906162977

#---code follows----------------//
def makeDict(numItems=25):
     d = {}
     for i in range(numItems):
         k = 'key%s' % i
         v = 'value%s' % i
         d[k] = v
     return d

def time_cPickle_Load():
     import time
     now = time.time
     from cPickle import dump, load
     filename = 'deleteme.pkl'

     for i in (10, 100, 1000, 10000):
         data = [makeDict() for j in range(i)]
         output = open(filename, 'w')
         startDump = now()
         print "dumping list of %s dicts:" % i
         dump(data, output)
         print "dumped:", now() - startDump
         output.close()
         input = open(filename)
         startLoad = now()
         print "loading list of %s dicts:" % i
         x = load(input)
         print "loaded:", now() - startLoad
         input.close()






More information about the Python-list mailing list