map is useless!

Sun Jun 6 19:20:10 EDT 2010

On Sun, 06 Jun 2010 08:16:02 -0700, rantingrick wrote:

> Everyone knows i'm a Python fanboy so nobody can call me a troll for
> this...

The first rule of trolling is, always deny being a troll, no matter how 
obvious the trolling. But on the chance I'm wrong, and for the benefit of 
others, your tests don't measure what you think they are measuring and 
consequently your results are invalid. Read on.

> Python map is just completely useless. For one it so damn slow why even
> bother putting it in the language? And secondly, the total "girl- man"
> weakness of lambda renders it completely mute!

Four trolls in three sentences. Way to go "fanboy".

(1) "Completely" useless? It can't do *anything*?

(2) Slow compared to what?

(3) Are you implying that map relies on lambda?

(4) What's wrong with lambda anyway?

By the way, nice sexist description there. "Girl-man weakness" indeed. 
Does your mum know that you are so contemptuous about females?

> Ruby has a very nice map

I'm thrilled for them. Personally I think the syntax is horrible.

>>>> [1,2,3].map{|x| x.to_s}
> 
> Have not done any benchmarking 

"... but by counting under my breath while the code runs, I'm POSITIVE 
Ruby is much faster that Python!"

By complaining about Python being too slow while admitting that you 
haven't actually tested the speed of your preferred alternative, you have 
*negative* credibility.

> but far more useful from the programmers
> POV. And that really stinks because map is such a useful tool it's a
> shame to waste it. Here are some test to back up the rant.
> 
> 
>>>> import time
>>>> def test1():
> 	l = range(10000)
> 	t1 = time.time()
> 	map(lambda x:x+1, l)
> 	t2= time.time()
> 	print t2-t1

That's a crappy test.

(1) You include the cost of building a new function each time.

(2) You make no attempt to protect against the inevitable variation in 
speed caused by external processes running on a modern multi-process 
operating system.

(3) You are reinventing the wheel (badly) instead of using the timeit 
module.

>>>> def test2():
> 	l = range(10000)
> 	t1 = time.time()
> 	for x in l:
> 		x + 1
> 	t2 = time.time()
> 	print t2-t1

The most obvious difference is that in test1, you build a 10,000 item 
list, while in test2, you don't. And sure enough, not building a list is 
faster than building a list:

>>>> test1()
> 0.00200009346008
>>>> test2()
> 0.000999927520752

>>>> def test3():
> 	l = range(10000)
> 	t1 = time.time()
> 	map(str, l)
> 	t2= time.time()
> 	print t2-t1
> 
> 
>>>> def test4():
> 	l = range(10000)
> 	t1 = time.time()
> 	for x in l:
> 		str(x)
> 	t2= time.time()
> 	print t2-t1
> 
> 
>>>> test3()
> 0.00300002098083
>>>> test4()
> 0.00399994850159

Look ma, not building a list is still faster than building a list!

> So can anyone explain this poor excuse for a map function? Maybe GVR
> should have taken it out in 3.0?  *scratches head*

So, let's do some proper tests. Using Python 2.6 on a fairly low-end 
desktop, and making sure all the alternatives do the same thing:

>>> from timeit import Timer
>>> t1 = Timer('map(f, L)', 'f = lambda x: x+1; L = range(10000)')
>>> t2 = Timer('''accum = []
... for item in L:
...     accum.append(f(item))
...
... ''', 'f = lambda x: x+1; L = range(10000)')
>>>
>>> min(t1.repeat(number=1000))
3.5182700157165527
>>> min(t2.repeat(number=1000))
6.702117919921875

For the benefit of those who aren't used to timeit, the timings at the 
end are the best-of-three of repeating the test code 1000 times. The time 
per call to map is 3.5 milliseconds compared to 6.7 ms for unrolling it 
into a loop and building the list by hand. map is *much* faster.

How does it compare to a list comprehension? The list comp can avoid a 
function call and do the addition inline, so it will probably be 
significantly faster:

>>> t3 = Timer('[x+1 for x in  L]', "L = range(10000)")
>>> min(t3.repeat(number=1000))
2.0786428451538086

And sure enough it is. But when you can't avoid the function call, the 
advantage shifts back to map:

>>> t4 = Timer('map(str, L)', "L = range(10000)")
>>> t5 = Timer('[str(x) for x in  L]', "L = range(10000)")
>>> min(t4.repeat(number=1000))
3.8360331058502197
>>> min(t5.repeat(number=1000))
6.6693520545959473

Lessons are:

(1) If you're going to deny being a troll, avoid making inflammatory 
statements unless you can back them up.

(2) Understand what you are timing, and don't compare apples to snooker 
balls just because they're both red.

(3) Timing tests are hard to get right. Use timeit.

(4) map is plenty fast.

Have a nice day.

-- 
Steven