[portland] Python dictionary performance as memory usage increases

Jonathan Karon monk at netjunky.com
Fri Aug 5 20:34:37 CEST 2011


Hi Ryan,

a few thoughts:

On the surface your code seems reasonable. 

I'm not a python internals expert by any means, but it's quite possible that one or more optimizations by the compiler are messing with you.

Since you don't retain a reference to tmp outside of the first loop, the garbage collector could be cacheing and re-using the allocated list for each iteration, which saves a massive amount of allocation work. (It could also be optimizing out the list operations entirely...)

The way you are allocating a chunk of memory to test under load creates 30,000,000 distinct blocks of memory. This is going to slow down new memory block allocation due to the way memory management works. It's not a python-specific thing, it applies to any general-purpose memory allocation strategy -- the more blocks of memory you have allocated the more time it takes to allocate new ones. 

~jonathan


On Aug 5, 2011, at 10:42 AM, Ryan Roser wrote:

> Hi,
> 
> I'm trying to improve the performance of some of my code.  I've noticed that
> one of the bottlenecks involves making a large dictionary where the values
> are lists.  Making a large dictionary is fast, repeatedly creating lists is
> fast, but things slow down if I set the lists as values for the dictionary.
> Interestingly, this slowdown only occurs if there is already data in memory
> in Python, and things get increasingly slow as the amount of memory used
> increases.
> 
> I have a toy example demonstrating the behavior below. Do you know why this
> is happening?  Is there a problem with my test?  Does Python do something
> special when storing lists as values in dictionaries?  Is there a workaround
> or an alternative data structure that doesn't exhibit slowdown as Python's
> memory usage increases?
> 
> Thanks for the help,
> 
> Ryan
> 
> 
> 
> ####################################
> ##  A test script
> ####################################
> import time
> import random
> 
> x = range(100000)
> def test():
>    # Creating a dictionary with an entry for each element in x
>    # is fast, and so is repeatedly creating a list
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = 1
>    print 'dict w/o lists:', time.time() - start
> 
>    # but assigning the list to the dictionary gets very slow
>    # if memory is not empty
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = tmp
>    print 'dict w lists:  ', time.time() - start
> 
> print 'runtimes with memory empty'
> test()
> print 'loading data'
> data = [random.random() for i in xrange(30000000)] # ~1gb of mem
> print 'runtimes with memory occupied'
> test()
> ####################################
> 
> 
> Results:
> 
> $ python2.4 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0506901741028
> dict w lists:   0.0766770839691
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0391671657562
> dict w lists:   2.18966984749
> 
> $ python2.6 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0479600429535
> dict w lists:   0.0784649848938
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0361380577087
> dict w lists:   2.49754095078
> 
> $ python2.7 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0464890003204
> dict w lists:   0.0735650062561
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0356121063232
> dict w lists:   2.49307012558
> 
> 
> ######## Python versions and machine info #########
> Machine has 32 gb of ram, 8 cores
> 
> Python 2.4.3 (#1, Sep  3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
> 
> ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
> Python 2.6.5 (r265:79063, Jul  5 2010, 10:31:13)
> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2
> 
> Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
> 
> $ uname -a
> Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html>
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland



More information about the Portland mailing list