List comprehension timing difference.

Thu Sep 1 21:12:47 EDT 2011

On 02/09/2011 01:35, Bart Kastermans wrote:
>
> In the following code I create the graph with vertices
> sgb-words.txt (the file of 5 letter words from the
> stanford graphbase), and an edge if two words differ
> by one letter.  The two methods I wrote seem to me to
> likely perform the same computations, the list comprehension
> is faster though (281 seconds VS 305 seconds on my dell mini).
>
> Is the right interpretation of this timing difference
> that the comprehension is performed in the lower level
> C code?
>
> As this time I have no other conjecture about the cause.
>
> ---------------------------------------------------------
> import time
> import copy
>
> data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())
>
> def d (w1, w2):
>      count = 0
>      for idx in range(0,5):
>          if w1[idx] != w2[idx]:
>              count += 1
>      return count
>
> print "creating graph"
> t0 = time.clock ()
> graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a<  b]
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."
>
> t0 = time.clock ()
> graph2 = []
> for i in range (0, len(data)):
>      for j in range(0,len(data)):
>          if d(data[i],data[j]) == 1 and i<  j:
>              graph2.append ([i,j])
> t1 = time.clock ()
> print "took " + str (t1 - t0) + " seconds."

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

     [a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

     [i, j]

The second version has subscripting ("data[i]" and "data[j]"), which
will slow it down.