Performance of lists vs. list comprehensions
gerald.britton at gmail.com
Tue Jan 19 17:10:43 CET 2010
Thanks! Good explanation.
On Tue, Jan 19, 2010 at 10:57 AM, Alf P. Steinbach <alfps at start.no> wrote:
> * Gerald Britton:
>> Yesterday I stumbled across some old code in a project I was working
>> on. It does something like this:
>> mystring = '\n'.join( [ line for line in lines if <some conditions
>> depending on line> ] )
>> where "lines" is a simple list of strings. I realized that the code
>> had been written before you could put a list comprehension as an
>> argument to join(). I figured that I could drop the '[' and ']' and
>> leave the rest as a list comprehension since that works with current
>> Python (and the project's base is 2.5). So I rewrote the original
>> statement like this:
>> mystring = '\n'.join( line for line in lines if <some conditions
>> depending on line> )
>> It works as expected. Then I got curious as to how it performs. I
>> was surprised to learn that the rewritten expression runs more than
>> twice as _slow_. e.g.:
>> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
>>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
>> Notice that I dropped the condition testing that was in my original
>> code. I just wanted to see the effect of two different expressions.
>> I thought that maybe there was some lower bound on the number of the
>> items in the list or list comprehension beyond which the comprehension
>> would prove more efficient. There doesn't appear to be one. I scaled
>> the length of the input list up to 1 million items and got more or
>> less the same relative performance.
>> Now I'm really curious and I'd like to know:
>> 1. Can anyone else confirm this observation?
>>>> Timer("' '.join([x for x in l])", 'l = map(str,range(10))').timeit()
>>>> Timer("' '.join(x for x in l)", 'l = map(str,range(10))').timeit()
>> 2. Why should the "pure" list comprehension be slower than the same
>> comprehension enclosed in '[...]' ?
> Regarding (2) the unparenthesized expression in join is *not* a list
> comprehension but a generator expression.
> And as such it involves join calling next() on the generator object
> repeatedly, with each next() call involving a light-weight context shift.
> In addition the docs mumble something about "lazy" evaluation, and that may
> also contribute to the overhead.
> I think that in contrast, the interpreter can evaluate a list comprehension,
> [x for x in blah], directly without any context shifting, just by
> transforming it to equivalent code and putting the target expressions
> innermost there.
> And so the main factor causing a slowdown for a list comprehension would, I
> think, be paging and such if the list it produced was Really Huge, while for
> the generator there's no memory issue but rather much calling & context
> Cheers & hth.,
> - Alf
More information about the Python-list