[portland] Python dictionary performance as memory usage increases

Sam Thompson georgedorn at gmail.com
Fri Aug 5 20:48:32 CEST 2011


This has everything to do with the garbage collector and the overhead
for allocating more memory to the python process.
I ran some more tests, including running the 'dict w lists' code multiple times:

python tester.py
runtimes with memory empty
dict w/o lists: 0.0260519981384
dict w lists:   0.0418920516968
dict w lists:   0.0497941970825
loading data
runtimes with memory occupied
dict w/o lists: 0.0242760181427
dict w lists:   1.14558005333
dict w lists:   0.52930688858

It would appear that the first algorithm to create a bunch of lists
with occupied memory incurs some extra overhead, probably due to
memory allocation by the OS to the python process.  Further runs don't
have this problem as the previous run can use the GC'd memory from the
prior run.

Interestingly, pypy doesn't appear to exhibit this behavior to the
same degree, probably because its GC algorithm differs from cpython's:
pypy tester.py
runtimes with memory empty
dict w/o lists: 0.217184782028
dict w lists:   0.0508198738098
dict w lists:   0.0368840694427
loading data
runtimes with memory occupied
dict w/o lists: 0.0227448940277
dict w lists:   0.0329740047455
dict w lists:   0.0272088050842


In investigating this, I found yet another strange behavior.
Allocate a bunch of memory, create a huge number of lists, delete
them, and the next algorithm run involving lists is somehow even
faster than the constants.
Code is here: http://pastebin.com/GdMNkM2f

And the results:

runtimes with memory empty
dict w/o lists: 0.0253469944
dict w lists:   0.0402500629425
dict w lists:   0.0478730201721
dict w lists:   0.0460600852966
loading data
freeing some memory
runtimes with memory occupied
dict w/o lists: 0.0804419517517
dict w lists:   0.0254349708557      <---- What!?
dict w lists:   0.523415803909
dict w lists:   0.418542861938




On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser <ryanroser at gmail.com> wrote:
> Hi,
>
> I'm trying to improve the performance of some of my code.  I've noticed that
> one of the bottlenecks involves making a large dictionary where the values
> are lists.  Making a large dictionary is fast, repeatedly creating lists is
> fast, but things slow down if I set the lists as values for the dictionary.
>  Interestingly, this slowdown only occurs if there is already data in memory
> in Python, and things get increasingly slow as the amount of memory used
> increases.
>
> I have a toy example demonstrating the behavior below. Do you know why this
> is happening?  Is there a problem with my test?  Does Python do something
> special when storing lists as values in dictionaries?  Is there a workaround
> or an alternative data structure that doesn't exhibit slowdown as Python's
> memory usage increases?
>
> Thanks for the help,
>
> Ryan
>
>
>
> ####################################
> ##  A test script
> ####################################
> import time
> import random
>
> x = range(100000)
> def test():
>    # Creating a dictionary with an entry for each element in x
>    # is fast, and so is repeatedly creating a list
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = 1
>    print 'dict w/o lists:', time.time() - start
>
>    # but assigning the list to the dictionary gets very slow
>    # if memory is not empty
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = tmp
>    print 'dict w lists:  ', time.time() - start
>
> print 'runtimes with memory empty'
> test()
> print 'loading data'
> data = [random.random() for i in xrange(30000000)] # ~1gb of mem
> print 'runtimes with memory occupied'
> test()
> ####################################
>
>
> Results:
>
> $ python2.4 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0506901741028
> dict w lists:   0.0766770839691
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0391671657562
> dict w lists:   2.18966984749
>
> $ python2.6 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0479600429535
> dict w lists:   0.0784649848938
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0361380577087
> dict w lists:   2.49754095078
>
> $ python2.7 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0464890003204
> dict w lists:   0.0735650062561
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0356121063232
> dict w lists:   2.49307012558
>
>
> ######## Python versions and machine info #########
> Machine has 32 gb of ram, 8 cores
>
> Python 2.4.3 (#1, Sep  3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
>
> ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
> Python 2.6.5 (r265:79063, Jul  5 2010, 10:31:13)
> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2
>
> Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
>
> $ uname -a
> Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html>
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland
>


More information about the Portland mailing list