Performance of lists vs. list comprehensions

Tue Jan 19 10:57:39 EST 2010

* Gerald Britton:
> Yesterday I stumbled across some old code in a project I was working
> on.  It does something like this:
> 
> mystring = '\n'.join( [ line for line in lines if <some conditions
> depending on line> ] )
> 
> where "lines" is a simple list of strings.  I realized that the code
> had been written before you could put a list comprehension as an
> argument to join().  I figured that I could drop the '[' and ']' and
> leave the rest as a list comprehension since that works with current
> Python (and the project's base is 2.5).  So I rewrote the original
> statement like this:
> 
> mystring = '\n'.join( line for line in lines if <some conditions
> depending on line> )
> 
> It works as expected.  Then I got curious as to how it performs.  I
> was surprised to learn that the rewritten expression runs more than
> twice as _slow_.  e.g.:
> 
>>>> l
> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
> 
>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
> 2.9967339038848877
> 
>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
> 7.2045478820800781
> 
> Notice that I dropped the condition testing that was in my original
> code.  I just wanted to see the effect of two different expressions.
> 
> I thought that maybe there was some lower bound on the number of the
> items in the list or list comprehension beyond which the comprehension
> would prove more efficient.  There doesn't appear to be one.  I scaled
> the length of the input list up to 1 million items and got more or
> less the same relative performance.
> 
> Now I'm really curious and I'd like to know:
> 
> 1. Can anyone else confirm this observation?

 >>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
5.8625191190500345
 >>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
12.093135300715574
 >>> _

> 2. Why should the "pure" list comprehension be slower than the same
> comprehension enclosed in '[...]' ?

Regarding (2) the unparenthesized expression in join is *not* a list 
comprehension but a generator expression.

And as such it involves join calling next() on the generator object repeatedly, 
with each next() call involving a light-weight context shift.

In addition the docs mumble something about "lazy" evaluation, and that may also 
contribute to the overhead.

I think that in contrast, the interpreter can evaluate a list comprehension, [x 
for x in blah], directly without any context shifting, just by transforming it 
to equivalent code and putting the target expressions innermost there.

And so the main factor causing a slowdown for a list comprehension would, I 
think, be paging and such if the list it produced was Really Huge, while for the 
generator there's no memory issue but rather much calling & context shifting.

Cheers & hth.,

- Alf