map is useless!
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sun Jun 6 19:20:10 EDT 2010
On Sun, 06 Jun 2010 08:16:02 -0700, rantingrick wrote:
> Everyone knows i'm a Python fanboy so nobody can call me a troll for
> this...
The first rule of trolling is, always deny being a troll, no matter how
obvious the trolling. But on the chance I'm wrong, and for the benefit of
others, your tests don't measure what you think they are measuring and
consequently your results are invalid. Read on.
> Python map is just completely useless. For one it so damn slow why even
> bother putting it in the language? And secondly, the total "girl- man"
> weakness of lambda renders it completely mute!
Four trolls in three sentences. Way to go "fanboy".
(1) "Completely" useless? It can't do *anything*?
(2) Slow compared to what?
(3) Are you implying that map relies on lambda?
(4) What's wrong with lambda anyway?
By the way, nice sexist description there. "Girl-man weakness" indeed.
Does your mum know that you are so contemptuous about females?
> Ruby has a very nice map
I'm thrilled for them. Personally I think the syntax is horrible.
>>>> [1,2,3].map{|x| x.to_s}
>
> Have not done any benchmarking
"... but by counting under my breath while the code runs, I'm POSITIVE
Ruby is much faster that Python!"
By complaining about Python being too slow while admitting that you
haven't actually tested the speed of your preferred alternative, you have
*negative* credibility.
> but far more useful from the programmers
> POV. And that really stinks because map is such a useful tool it's a
> shame to waste it. Here are some test to back up the rant.
>
>
>>>> import time
>>>> def test1():
> l = range(10000)
> t1 = time.time()
> map(lambda x:x+1, l)
> t2= time.time()
> print t2-t1
That's a crappy test.
(1) You include the cost of building a new function each time.
(2) You make no attempt to protect against the inevitable variation in
speed caused by external processes running on a modern multi-process
operating system.
(3) You are reinventing the wheel (badly) instead of using the timeit
module.
>>>> def test2():
> l = range(10000)
> t1 = time.time()
> for x in l:
> x + 1
> t2 = time.time()
> print t2-t1
The most obvious difference is that in test1, you build a 10,000 item
list, while in test2, you don't. And sure enough, not building a list is
faster than building a list:
>>>> test1()
> 0.00200009346008
>>>> test2()
> 0.000999927520752
>>>> def test3():
> l = range(10000)
> t1 = time.time()
> map(str, l)
> t2= time.time()
> print t2-t1
>
>
>>>> def test4():
> l = range(10000)
> t1 = time.time()
> for x in l:
> str(x)
> t2= time.time()
> print t2-t1
>
>
>>>> test3()
> 0.00300002098083
>>>> test4()
> 0.00399994850159
Look ma, not building a list is still faster than building a list!
> So can anyone explain this poor excuse for a map function? Maybe GVR
> should have taken it out in 3.0? *scratches head*
So, let's do some proper tests. Using Python 2.6 on a fairly low-end
desktop, and making sure all the alternatives do the same thing:
>>> from timeit import Timer
>>> t1 = Timer('map(f, L)', 'f = lambda x: x+1; L = range(10000)')
>>> t2 = Timer('''accum = []
... for item in L:
... accum.append(f(item))
...
... ''', 'f = lambda x: x+1; L = range(10000)')
>>>
>>> min(t1.repeat(number=1000))
3.5182700157165527
>>> min(t2.repeat(number=1000))
6.702117919921875
For the benefit of those who aren't used to timeit, the timings at the
end are the best-of-three of repeating the test code 1000 times. The time
per call to map is 3.5 milliseconds compared to 6.7 ms for unrolling it
into a loop and building the list by hand. map is *much* faster.
How does it compare to a list comprehension? The list comp can avoid a
function call and do the addition inline, so it will probably be
significantly faster:
>>> t3 = Timer('[x+1 for x in L]', "L = range(10000)")
>>> min(t3.repeat(number=1000))
2.0786428451538086
And sure enough it is. But when you can't avoid the function call, the
advantage shifts back to map:
>>> t4 = Timer('map(str, L)', "L = range(10000)")
>>> t5 = Timer('[str(x) for x in L]', "L = range(10000)")
>>> min(t4.repeat(number=1000))
3.8360331058502197
>>> min(t5.repeat(number=1000))
6.6693520545959473
Lessons are:
(1) If you're going to deny being a troll, avoid making inflammatory
statements unless you can back them up.
(2) Understand what you are timing, and don't compare apples to snooker
balls just because they're both red.
(3) Timing tests are hard to get right. Use timeit.
(4) map is plenty fast.
Have a nice day.
--
Steven
More information about the Python-list
mailing list