Yuriy Taraday <yorik.sar@...> writes:
On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!
I couldn't believe it so I had to check it:
from __future__ import print_function
import functools, itertools, operator, timeit
def var1(): def _gen(): for i in range(100): if i > 50: break yield i
return list(_gen())
def var2(): def stop(): raise StopIteration return list(i for i in range(100) if i <= 50 or stop())
def var3(): return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]
def var4(): return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))]
if __name__ == '__main__': for f in (var1, var2, var3, var4): print(f.__name__, end=' ') print(timeit.timeit(f))
Results on my machine:
var1 20.4974410534 var2 23.6218020916 var3 32.1543409824 var4 4.90913701057
var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions?
var3 is the slowest. I guess, because it has lambda in it. But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can.
-- Kind regards, Yuriy.
Steven D'Aprano <steve@...> writes:
On 30/01/13 02:44, Wolfgang Maier wrote:
list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())!
I think you are mistaken about the speed. The itertools iterators are highly optimized and do all their work in fast C code. If you are seeing takewhile as slow, you are probably doing something wrong: untrustworthy timing code, misinterpreting what you are seeing, or some other error.
Here's a comparison done the naive or obvious way. Copy and paste it into an interactive Python session:
from itertools import takewhile from timeit import Timer
def stop(): raise StopIteration
setup = 'from __main__ import stop, takewhile'
t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup) t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup)
min(t1.repeat(number=100000, repeat=5)) min(t2.repeat(number=100000, repeat=5))
On my computer, t1 is about 1.5 times faster than t2. But this is misleading, because it's not takewhile that is slow. I am feeding something slow into takewhile. If I really need to run as fast as possible, I can optimize the function call inside takewhile:
from operator import lt from functools import partial
small_enough = partial(lt, 50) setup2 = 'from __main__ import takewhile, small_enough'
t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2)
min(t3.repeat(number=100000, repeat=5))
On my computer, t3 is nearly 13 times faster than t1, and 19 times faster than t2. Here are the actual times I get, using Python 2.7:
py> min(t1.repeat(number=100000, repeat=5)) # using the StopIteration hack 1.2609241008758545 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.85182785987854 py> min(t3.repeat(number=100000, repeat=5)) # optimized version 0.09847092628479004
Hi Yuriy and Steven, a) I had compared the originally proposed 'takewhile with lambda' version to the 'if cond or stop()' solution using 'timeit' just like you did. In principle, you find the same as I did, although I am a bit surprised that our differences are different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than 'takewhile with lambda'. b) I have to say I was very impressed by the speed gains you report through the use of 'partial', which I had not thought of at all, I have to admit. However, I tested your suggestions and I think they both suffer from the same mistake: your condition is 'partial(lt,50)', but this is not met to begin with and results in an empty list at least for me. Have you two actually checked the output of the code or have you just timed it? I found that in order to make it work the comparison has to be made via 'partial(gt,50)'. With this modification the resulting list in your example would be [0,..,49] as it should be. And now the big surprise in terms of runtimes: partial(lt,50) variant: 1.17 (but incorrect results) partial(gt,50) variant: 13.95 if cond or stop() variant: 9.86 I guess python is just smart enough to recognize that it compares against a constant value all the time, and optimizes the code accordingly (after all the if clause is a pretty standard thing to use in a comprehension). So the reason for your reported speed-gain is that you actually broke out of the comprehension at the very first element instead of going through the first 50! Please comment, if you get different results. Best, Wolfgang