merge list of tuples with list

Steven D'Aprano steve-REMOVE-THIS at cybersource.com.au
Wed Oct 20 20:13:02 EDT 2010


On Wed, 20 Oct 2010 14:32:53 -0700, Daniel Wagner wrote:

> I really appreciate your solutions but they bring me to a new question:
> Why is my solution so inefficient? The same operation without the
> list/tuple conversion
> 
> $ python /[long_path]/timeit.py 'a=[[1,2,3], [4,5,6]];b=[7,8];map(lambda
> x: x + [b.pop(0)] , a)' 100000 loops, best of 3: 3.36 usec per loop
> 
> is still horrible slow. 


What makes you say that? 3 microseconds to create four lists, two 
assignments, create a function object, then inside the map look up the 
global b twice, the method 'pop' twice, call the method twice, resize the 
list b twice, create an inner list twice, concatenate that list with 
another list twice, and stuff those two new lists into a new list... 
3usec for all that in Python code doesn't seem unreasonable to me.

On my PC, it's two orders of magnitude slower than a pass statement. That 
sounds about right to me.


$ python -m timeit
10000000 loops, best of 3: 0.0325 usec per loop
$ python -m timeit 'a=[[1,2,3], [4,5,6]];b=[7,8];map(lambda x: x + [b.pop
(0)] , a)'
100000 loops, best of 3: 4.32 usec per loop


Can we do better?

$ python -m timeit -s 'a=[[1,2,3], [4,5,6]]; f = lambda x: x + [b.pop
(0)]' 'b=[7,8]; map(f, a)'
100000 loops, best of 3: 3.25 usec per loop

On my system, moving the creation of the list a and the code being timed 
and into the setup code reduces the time by 25%. Not too shabby.


> Could anybody explain me what it makes so slow?
> Is it the map() function or maybe the lambda construct?

lambdas are just functions -- there is no speed difference between a 
function 

def add(a, b):
    return a+b

and lambda a, b: a+b

The looping overhead of map(f, data) is minimal. But in this case, the 
function you're calling does a fair bit of work:

lambda x: x + [b.pop(0)]

This has to lookup the global b, resize it, create a new list, 
concatenate it with the list x (which creates a new list, not an in-place 
concatenation) and return that. The amount of work is non-trivial, and I 
don't think that 3us is unreasonable.

But for large lists b, it will become slow, because resizing the list is 
slow. Popping from the start on a regular list has to move every element 
over, one by one. You may find using collections.deque will be *much* 
faster for large lists. (But probably not for small lists.)

Personally, the approach I'd take is:

a = [[1,2,3], [4,5,6]]
b = [7,8]
[x+[y] for x,y in zip(a,b)]


Speedwise:

$ python -m timeit -s 'a=[[1,2,3], [4,5,6]]; b=[7,8]' '[x+[y] for x,y in 
zip(a,b)]'
100000 loops, best of 3: 2.43 usec per loop


If anyone can do better than that (modulo hardware differences), I'd be 
surprised.



-- 
Steven



More information about the Python-list mailing list