lists as an efficient implementation of large two-dimensional arrays(!)

Mitchell L Model MLMDev at
Tue Feb 2 21:14:27 CET 2010

An instructive lesson in YAGNI ("you aren't going to need it"),  
premature optimization, and not making assumptions about Python data  
structure implementations.

I need a 1000 x 1000 two-dimensional array of objects. (Since they are  
instances of application classes it appears that the array module is  
useless; likewise, since I am using Python 3.1, so among other things,  
I can't use numpy or its relatives.) The usage pattern is that the  
array is first completely filled with objects. Later, objects are  
sometimes accessed individually by row and column and often the entire  
array is iterated over.

Worried (unnecessarily, as it turns out) by the prospect of 1,000,000  
element list I started by constructing a dictionary with the keys 1  
through 1000, each of which had as its value another dictionary with  
the keys 1 through 1000. Actual values were the values of the second  
level dictionary.

Using numbers to fill the array to minimize the effect of creating my  
more complex objects, and running Python 3.1.1 on an 8-core Mac Pro  
with 8Gb memory, I tried the following

#create and fill the array:
t1 = time.time()
d2 = {}
for j in range(1000):
     d2[j] = dict()
     for k in range(1000):
         d2[j][k] = k
print( round(time.time() - t1, 2))


# access each element of the array:
t1 = time.time()
for j in range(1000):
     for k in range(1000):
         elt = d2[j][k]
print( round(time.time() - t1, 2))


  My program was too slow, so I started investigating whether I could  
improve on the two-level dictionary, which got used a lot. To get  
another baseline I tried a pure 1,000,000-element list, expecting the  
times to be be horrendous, but look!

# fill a list using append
t1 = time.time()
lst = []
for n in range(1000000):
print( round(time.time() - t1, 2))


# access every element of a list
t1 = time.time()
for n in range(1000000):
     elt = lst[n]
print( round(time.time() - t1, 2))


What a shock! I could save half the execution time and all my clever  
work and awkward double-layer dictionary expressions by just using a  

Even better, look what happens using a comprehension to create the  
list instead of a loop with list.append:

t1 = time.time()
lst = [n for n in range(1000000)]
print( round(time.time() - t1, 2))


Half again to create the list.

Iterating over the whole list is easier and faster than iterating over  
the double-level dictionary, in particular because it doesn't involve  
a two-level loop. But what about individual access given a row and a  

t1 = time.time()
for j in range(1000):
     for k in range(1000):
         elt = lst[j * 1000 + k]
print( round(time.time() - t1, 2))


This is the same as for the dictionary.

I tried a two-level list and a few other things but still haven't  
found anything that works better than a single long list -- just like  
2-D arrays are coded in old-style languages, with indices computed as  
offsets from the beginning of the linear sequence of all the values.  
What's amazing is that creating and accessing 1,000,000-element list  
in Python is so efficient. The usual moral obtains: start simple,  
analyze problems (functional or performance) as they arise, decide  
whether they are worth the cost of change, then change in very limited  
ways. And of course abstract and modularize so that, in my case, for  
example, none of the program's code would be affected by the change  
from a two-level dictionary representation to using a single long list.

I realize there are many directions an analysis like this can follow,  
and many factors affecting it, including patterns of use. I just  
wanted to demonstrate the basics for a situation that I just  
encountered. In particular, if the array was sparse, rather than  
completely full, the two-level dictionary implementation would be the  
natural representation.

More information about the Python-list mailing list