Python 2.0b1 List comprehensions are slow

Kevin O'Connor koconnor at cse.Buffalo.EDU
Sat Sep 9 23:05:57 EDT 2000


On Fri, Sep 08, 2000 at 08:46:18AM -0500, Skip Montanaro wrote:
> If you look at the bytecode generated by a list comprehension it should be
> about the same as that generated for an equivalent set of nested for and if
> statements.  One of the reasons to use map instead of a for loop is that it
> is often more efficient (essentially pushing the loop indexing machinery
> from the Python VM into C).  This isn't going to change with list
> comprehensions, certainly not in the short term.

>From the paragraph above, you seem to be saying that map is faster than
list comprehensions because the "loop indexing machinery" is done in C.
Although this may be accurate, all of my tests indicate that the loop
control is not the problem - the append() method is.

It seems like map()'s ability to preallocate the appropriate number of
items in advance makes it the faster algorithm.  I've extended my original
test program to include several new tests; the pure-python for loops that
use preallocation seem most telling.

I've also extended my test program to test some filter cases.  (Using
filter, and list comprehensions with an 'if' clause.)  It is interesting to
note that the list comprehensions win hands down in all cases!  It's a
tighter race, however, when the resulting lists are almost as big as the
original list (again indicating the overhead is in the append() call).

> Seriously, inputs on how to improve the generated code are welcome, though
> I'm skeptical there's much that can be done (the list's append method is
> already cached as I recall, which is where the speedup over for loops comes
> from), certainly before 2.0final.  If a significant speedup was possible
> without a major reworking of the Python VM, I think we'd have seen it by now
> in the bytecode generated for for loops.

Well, a simple optimization would be to use lambda and map() in the byte
code when there is no filter clause in the list comprehension.  This seems
to always be a win.

A better optimization would include some way of hinting to the VM that a
list should be preallocated to a certain size.  (Whether or not to
preallocate on list comprehensions that use an 'if' clause is dubious, but
on simple map operations, I speculate that preallocation would be a big
win.)

Another option would be to use bytecode similar to what I use below in
"bar6()", even though it doesn't work perfectly with some corner cases
(like if someone issued a "[ a.pop for x in a ]").

Finally, one could always look at optimizing the append method.  :-)

I know this seems pretty nit-picky.  I'm not complaining at all - to
reiterate what was said in my first message - I think the list
comprehension feature is a GREAT addition.
My zeal to use python in as many applications as possible has led me to
utilizing it in situations where it really needed to be fast.  I'd love to
be able to get rid of all those ugly map calls without any fear that
performance will be impacted.  :-)

-Kevin


------------------------ My test results:

Many repeated map calls on small lists:
bar1(map + operator): 15.77 cpu seconds
bar2(map + lambda): 13.02 cpu seconds
bar3(list comprehension): 18.84 cpu seconds
bar4(for loop): 19.36 cpu seconds
bar5(non-returning for loop [BOGUS]): 5.85 cpu seconds
bar6(preallocating for loop): 15.17 cpu seconds

Many repeated filter calls on small lists:
baz2(map + filter + lambda): 28.58 cpu seconds
baz2_1(filter + lambda [BOGUS]): 19.15 cpu seconds
baz3(list comprehension): 18.49 cpu seconds
baz4(for loop): 18.91 cpu seconds
baz5(non-returning for loop [BOGUS]): 8.98 cpu seconds
baz6(preallocating for loop): 22.47 cpu seconds

Many repeated map calls on small lists:
foo1(Optimized map+operator): 11.71 cpu seconds
foo2(Optimized map+lambda): 12.21 cpu seconds
foo3(Optimized list comprehension): 17.57 cpu seconds
foo4(Optimized for loop): 18.06 cpu seconds
foo5(Optimized non-returning for loop [BOGUS]): 5.39 cpu seconds
foo6(Optimized preallocating for loop): 10.63 cpu seconds

One map call on a large list:
bar1(map + operator): 6.84 cpu seconds
bar2(map + lambda): 7.05 cpu seconds
bar3(list comprehension): 11.82 cpu seconds
bar4(for loop): 12.02 cpu seconds
bar5(non-returning for loop [BOGUS]): 3.52 cpu seconds
bar6(preallocating for loop): 7.38 cpu seconds

One filter call on a large list:
baz2(map + filter + lambda): 13.32 cpu seconds
baz2_1(filter + lambda [BOGUS]): 9.68 cpu seconds
baz3(list comprehension): 9.57 cpu seconds
baz4(for loop): 9.7 cpu seconds
baz5(non-returning for loop [BOGUS]): 5.17 cpu seconds
baz6(preallocating for loop): 10.09 cpu seconds


----------------------  Source code:

#!/usr/bin/env python

import operator
import time

Counter = range(200000)

def foo1(arry):
    """Optimized map+operator"""
    dummyarry = (12,) * len(arry)
    operator__add = operator.add
    __map = map
    for i in Counter:
	__map(operator__add, dummyarry, arry)

def foo2(arry):
    """Optimized map+lambda"""
    __map = map
    func = (lambda x: x+12)
    for i in Counter:
	__map(func, arry)

def foo3(arry):
    """Optimized list comprehension"""
    for i in Counter:
	[ x+12 for x in arry]
## def foo3(arry): pass

def foo4(arry):
    """Optimized for loop"""
    for i in Counter:
	tmparry = []
	tmparry__append = tmparry.append
	for j in arry:
	    tmparry__append(j+12)

def foo5(arry):
    """Optimized non-returning for loop [BOGUS]"""
    for i in Counter:
	for j in arry:
	    j+12

def foo6(arry):
    """Optimized preallocating for loop"""
    l = len(arry)
    dummyarry = [None,] * l
    dummycnt = range(l)
    for i in Counter:
	tmparry = list(dummyarry)
	for j in dummycnt:
	    tmparry[j] = arry[j]+12

def bar1(arry):
    """map + operator"""
    return map(operator.add, (12,)*len(arry), arry)

def bar2(arry):
    """map + lambda"""
    return map(lambda x: x+12, arry)

def bar3(arry):
    """list comprehension"""
    return [ x+12 for x in arry]
## def bar3(arry): pass

def bar4(arry):
    """for loop"""
    tmparry = []
    tmparry__append = tmparry.append
    for i in arry:
	tmparry__append(i+12)
    return tmparry

def bar5(arry):
    """non-returning for loop [BOGUS]"""
    for i in arry:
	i+12

def bar6(arry):
    """preallocating for loop"""
    l = len(arry)
    tmparry = [None,]*l
    for i in range(l):
	tmparry[i] = arry[i]+12
    return tmparry

def baz2(arry):
    """map + filter + lambda"""
    return map((lambda x: x+12), filter(lambda x: x & 1, arry))

def baz2_1(arry):
    """filter + lambda [BOGUS]"""
    return filter(lambda x: x & 1, arry)

def baz3(arry):
    """list comprehension"""
    return [ x+12 for x in arry if x & 1]
## def bar3(arry): pass

def baz4(arry):
    """for loop"""
    tmparry = []
    tmparry__append = tmparry.append
    for i in arry:
	if i & 1:
	    tmparry__append(i+12)
    return tmparry

def baz5(arry):
    """non-returning for loop [BOGUS]"""
    for i in arry:
	if i & 1:
	    i+12

def baz6(arry):
    """preallocating for loop"""
    l = len(arry)
    tmparry = [None,]*l
    count = 0
    for i in range(l):
	v = arry[i]
	if v & 1:
	    tmparry[count] = v+12
	    count = count + 1
    return tmparry[:count]

def timeit(func):
    st = time.clock()
    func(vals)
    en = time.clock()
    return en - st

if __name__=='__main__':

    vals = [1, 8, 12, 89, 325, 213, 23, 434, 435, 2435, 5, 45, 435, 324, 5234]

    print "Map accuracy test:"
    for func in (bar1, bar2, bar3, bar4, bar5, bar6):
	print "%s: %s" % (func.__name__, func(vals))
    print

    print "Filter accuracy test:"
    for func in (baz2, baz3, baz4, baz5, baz6):
	print "%s: %s" % (func.__name__, func(vals))
    print

    def many(func, arry):
	for i in Counter:
	    func(arry)

    print "Many repeated map calls on small lists:"
    for func in (bar1, bar2, bar3, bar4, bar5, bar6):
	print "%s(%s): %s cpu seconds" % (func.__name__, func.__doc__
					  , timeit((lambda arry, func=func:
						    many(func, arry))))
    print

    print "Many repeated filter calls on small lists:"
    for func in (baz2, baz2_1, baz3, baz4, baz5, baz6):
	print "%s(%s): %s cpu seconds" % (func.__name__, func.__doc__
					  , timeit((lambda arry, func=func:
						    many(func, arry))))
    print

    print "Many repeated map calls on small lists:"
    for func in (foo1, foo2, foo3, foo4, foo5, foo6):
	print "%s(%s): %s cpu seconds" % (func.__name__, func.__doc__
					  , timeit(func))
    print

    vals = range(2000000)

    print "One map call on a large list:"
    for func in (bar1, bar2, bar3, bar4, bar5, bar6):
	print "%s(%s): %s cpu seconds" % (func.__name__, func.__doc__
					  , timeit(func))
    print

    print "One filter call on a large list:"
    for func in (baz2, baz2_1, baz3, baz4, baz5, baz6):
	print "%s(%s): %s cpu seconds" % (func.__name__, func.__doc__
					  , timeit(func))
    print


> 
> -- 
> Skip Montanaro (skip at mojam.com)
> http://www.mojam.com/
> http://www.musi-cal.com/

-- 
 ------------------------------------------------------------------------
 | Kevin O'Connor                     "BTW, IMHO we need a FAQ for      |
 | koconnor at cse.buffalo.edu            'IMHO', 'FAQ', 'BTW', etc. !"    |
 ------------------------------------------------------------------------




More information about the Python-list mailing list