Filter versus comprehension (was Re: something about split()???)

Fri Aug 24 10:44:27 EDT 2012

On Wednesday, 22 August 2012 22:13:04 UTC+5:30, Terry Reedy  wrote:
> On 8/22/2012 3:30 AM, Mark Lawrence wrote:
> 
> > On 22/08/2012 06:46, Terry Reedy wrote:
> 
> >> On 8/21/2012 11:43 PM, mingqiang hu wrote:
> 
> >>> why filter is bad when use lambda ?
> 
> >>
> 
> >> Inefficient, not 'bad'. Because the equivalent comprehension or
> 
> >> generator expression does not require a function call.
> 
> 
> 
> for each item in the iterable.
> 
> 
> 
> > A case of premature optimisation? :)
> 
> 
> 
> No, as regards my post. I simply made a factual statement without 
> 
> advocating a particular action.
> 
> 
> 
> filter(lambda x: <expr>, iterable)
> 
> (x for x in iterable if <expr>)
> 
> 
> 
> both create iterators that produce the items in iterable such that 
> 
> bool(<expr>) is true. The following, with output rounded, shows 
> 
> something of the effect of the extra function call.
> 
> 
> 
>  >>> timeit.timeit("list(i for i in ranger if False)", "ranger=range(0)")
> 
> 0.91
> 
>  >>> timeit.timeit("list(i for i in ranger if False)", "ranger=range(20)")
> 
> 1.28
> 
>  >>> timeit.timeit("list(filter(lambda i: False, ranger))", 
> 
> "ranger=range(0)")
> 
> 0.83
> 
>  >>> timeit.timeit("list(filter(lambda i: False, ranger))", 
> 
> "ranger=range(20)")
> 
> 2.60
> 
> 
> 
> Simply keeping true items is faster with filter -- at least on my 
> 
> particular machine with 3.3.0b2.
> 
> 
> 
>  >>> timeit.timeit("list(filter(None, ranger))", "ranger=range(20)")
> 
> 1.03
> 
> 
> 
> Filter is also faster if the expression is a function call.
> 
> 
> 
>  >>> timeit.timeit("list(filter(f, ranger))", "ranger=range(20); 
> 
> f=lambda i: False")
> 
> 2.5033614114454394
> 
>  >>> timeit.timeit("list(i for i in ranger if f(i))", "ranger=range(20); 
> 
> f=lambda i: False")
> 
> 3.2394095327040304
> 
> 
> 
> ---
> 
> Perhaps or even yes as regards the so-called rule 'always use 
> 
> comprehension'. If one prefers filter as more readable, if one only 
> 
> wants to keep true items, if the expression is a function call, if 
> 
> evaluating the expression takes much more time than the extra function 
> 
> call so the latter does not matter, if the number of items is few enough 
> 
> that the extra time does not matter, then the rule is not needed or even 
> 
> wrong.
> 
> 
> 
> So I think PyLint should be changed to stop its filter fud.
> 
> 
> 
> -- 
> 
> Terry Jan Reedy

When filtering for true values, filter(None,xxx) can be used
Your examples with lambda i:False are unrealistic - you are comparing `if False` vs <lambda function>(xx) - function call vs boolean check