[Tutor] fast list traversal

Fri Oct 31 03:19:52 CET 2008

On Thu, Oct 30, 2008 at 4:55 PM, Shawn Milochik <Shawn at milochik.com> wrote:

> I just ran a very crude test.
>
> Results: Lists load more quickly but iterate more slowly. Dictionaries
> take longer to load but iteration takes about half the time.

Here are my results using timeit and Python 2.6:

Initializing a list is faster than a dict:
kent $ py -m timeit -n 1 -r 3 -s "l=[]" "for i in range(7654321):
l.append(i)"
1 loops, best of 3: 1.77 sec per loop

kent $ py -m timeit -n 1 -r 3 -s "d={}" "for i in range(7654321): d[i] = 0"
1 loops, best of 3: 2.27 sec per loop

The list version can be sped up considerably by hoisting the lookup of the
append method out of the loop:
kent $ py -m timeit -n 1 -r 3 -s "l=[];append=l.append" "for i in
range(7654321): append(i)"
1 loops, best of 3: 1.02 sec per loop

However this is not the fastest way to create either the list or the dict
from a range:
kent $ py -m timeit -n 1 -r 3  "l = range(7654321)"
1 loops, best of 3: 167 msec per loop

kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(range(7654321), 0)"
1 loops, best of 3: 1.63 sec per loop

xrange() is a little faster for the dict:
kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(xrange(7654321), 0)"
1 loops, best of 3: 1.43 sec per loop

That is why I asked the OP for some real code; you have to optimize for a
specific case, not a generality.

For iteration, I get nearly identical results for the list and dict:
kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "for i in l: i*i"
1 loops, best of 3: 2.74 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"for i in d: i*i"
1 loops, best of 3: 2.81 sec per loop

Using a list comprehension is slower, which surprised me; probably because
it has to actually create the list:

kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "[i*i for i in l]"
1 loops, best of 3: 3.65 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"[i*i for i in d]"
1 loops, best of 3: 3.66 sec per loop

The moral of the story, as usual with optimization, is that you have to time
and you have to use real code to get meaningful results.

fill_dict_time = time.time()
>
> for x in range(iter_num):
>    the_list.append(x)

Not sure why you have the list append in this loop?

>    the_dict[x] = 0
>
> fill_dict_time = time.time() - fill_dict_time

the_list now has twice as many entries as the_dict, which would account for
it taking twice as long to iterate. Which is another pitfall of timing
tests; it is remarkably easy to make a mistake and time something different
than what you think you are timing.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20081030/5c6de155/attachment.htm>