Performance of lists vs. list comprehensions

Tue Jan 19 10:30:21 EST 2010

Yesterday I stumbled across some old code in a project I was working
on.  It does something like this:

mystring = '\n'.join( [ line for line in lines if <some conditions
depending on line> ] )

where "lines" is a simple list of strings.  I realized that the code
had been written before you could put a list comprehension as an
argument to join().  I figured that I could drop the '[' and ']' and
leave the rest as a list comprehension since that works with current
Python (and the project's base is 2.5).  So I rewrote the original
statement like this:

mystring = '\n'.join( line for line in lines if <some conditions
depending on line> )

It works as expected.  Then I got curious as to how it performs.  I
was surprised to learn that the rewritten expression runs more than
twice as _slow_.  e.g.:

>>> l
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
2.9967339038848877

>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
7.2045478820800781

Notice that I dropped the condition testing that was in my original
code.  I just wanted to see the effect of two different expressions.

I thought that maybe there was some lower bound on the number of the
items in the list or list comprehension beyond which the comprehension
would prove more efficient.  There doesn't appear to be one.  I scaled
the length of the input list up to 1 million items and got more or
less the same relative performance.

Now I'm really curious and I'd like to know:

1. Can anyone else confirm this observation?

2. Why should the "pure" list comprehension be slower than the same
comprehension enclosed in '[...]' ?

-- 
Gerald Britton