[Tutor] fast list traversal

Dinesh B Vadhia dineshbvadhia at hotmail.com
Fri Oct 31 13:34:40 CET 2008


Hi Kent

The code is very simple:

dict_long_lists = defaultdict(list)
for long_list in dict_long_lists.itervalues()
        for element in long_list:
                array_a[element] = m + n + p        # m,n,p are numbers

The long_list's are read from a defaultdict(list) dictionary and so don't need initializing.  The elements of long_list are integers and ordered (sorted before placing in dictionary).  There are > 20,000 long_list's each with a variable number of elements (>5,000).  The elements of long_list are immutable (ie. don't change).

I've tried set() using defaultdict(set) but the elements are not ordered.

The problem is what is the fastest way to traverse long_list sequentially from the beginning to the end?  Maybe there is another data structure that can be used instead of a list.

Hth

Dinesh


--------------------------------------------------------------------------------

Message: 1
Date: Thu, 30 Oct 2008 22:19:52 -0400
From: "Kent Johnson" <kent37 at tds.net>
Subject: Re: [Tutor] fast list traversal
To: "Shawn Milochik" <Shawn at milochik.com>
Cc: tutor at python.org
Message-ID:
<1c2a2c590810301919u5078fc89od2dc182534b28215 at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Thu, Oct 30, 2008 at 4:55 PM, Shawn Milochik <Shawn at milochik.com> wrote:

> I just ran a very crude test.
>
> Results: Lists load more quickly but iterate more slowly. Dictionaries
> take longer to load but iteration takes about half the time.


Here are my results using timeit and Python 2.6:

Initializing a list is faster than a dict:
kent $ py -m timeit -n 1 -r 3 -s "l=[]" "for i in range(7654321):
l.append(i)"
1 loops, best of 3: 1.77 sec per loop

kent $ py -m timeit -n 1 -r 3 -s "d={}" "for i in range(7654321): d[i] = 0"
1 loops, best of 3: 2.27 sec per loop

The list version can be sped up considerably by hoisting the lookup of the
append method out of the loop:
kent $ py -m timeit -n 1 -r 3 -s "l=[];append=l.append" "for i in
range(7654321): append(i)"
1 loops, best of 3: 1.02 sec per loop

However this is not the fastest way to create either the list or the dict
from a range:
kent $ py -m timeit -n 1 -r 3  "l = range(7654321)"
1 loops, best of 3: 167 msec per loop

kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(range(7654321), 0)"
1 loops, best of 3: 1.63 sec per loop

xrange() is a little faster for the dict:
kent $ py -m timeit -n 1 -r 3  "d=dict.fromkeys(xrange(7654321), 0)"
1 loops, best of 3: 1.43 sec per loop

That is why I asked the OP for some real code; you have to optimize for a
specific case, not a generality.

For iteration, I get nearly identical results for the list and dict:
kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "for i in l: i*i"
1 loops, best of 3: 2.74 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"for i in d: i*i"
1 loops, best of 3: 2.81 sec per loop

Using a list comprehension is slower, which surprised me; probably because
it has to actually create the list:

kent $ py -m timeit -n 1 -r 3  -s "l = range(7654321)" "[i*i for i in l]"
1 loops, best of 3: 3.65 sec per loop

kent $ py -m timeit -n 1 -r 3  -s "d = dict.fromkeys(range(7654321), 0)"
"[i*i for i in d]"
1 loops, best of 3: 3.66 sec per loop

The moral of the story, as usual with optimization, is that you have to time
and you have to use real code to get meaningful results.

fill_dict_time = time.time()
>
> for x in range(iter_num):
>    the_list.append(x)


Not sure why you have the list append in this loop?


>    the_dict[x] = 0
>
> fill_dict_time = time.time() - fill_dict_time


the_list now has twice as many entries as the_dict, which would account for
it taking twice as long to iterate. Which is another pitfall of timing
tests; it is remarkably easy to make a mistake and time something different
than what you think you are timing.

Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20081030/5c6de155/attachment-0001.htm>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20081031/c1605e7f/attachment.htm>


More information about the Tutor mailing list