Which is faster? (if not b in m) or (if m.count(b) > 0)

Wed Feb 15 21:48:16 EST 2006

Georg Brandl wrote:

> Steven D'Aprano wrote:
> 
>>On Wed, 15 Feb 2006 08:44:10 +0100, Marc 'BlackJack' Rintsch wrote:
>>
>>
>>>In <1139976842.179801.285400 at z14g2000cwz.googlegroups.com>, Farel wrote:
>>>
>>>
>>>>Which is Faster in Python and Why?
>>>
>>>``if not b in m`` looks at each element of `m` until it finds `b` in it
>>>and stops then.  Assuming `b` is in `m`, otherwise all elements of `m` are
>>>"touched".
>>>
>>>``if m.count(b) > 0`` will always goes through all elements of `m` in the
>>>`count()` method.
>>
>>But the first technique executes in (relatively slow) pure Python, while
>>the count method executes (relatively fast) C code. So even though count
>>may do more work, it may do it faster.
> 
> 
> Why does "not b in m" execute in pure Python? list.__contains__ is implemented
> in C code as well as list.count.

Thank you to Kent Johnson for actually *timing* the two 
pieces of code with various sets of data, cutting 
through all the wild guesses (including mine). Which 
was my point.

To answer Georg's question, perhaps I was wrong -- I 
was under the impression that "target in source" 
executes more or less like:

for obj in source:
     if target == obj: return True
return False

Perhaps this was only true in older versions of Python, 
or perhaps I was misinformed, or perhaps I imagined the 
whole thing.

The important lesson is that your intuitions about what 
will be fast are not necessarily correct, and all the 
theorizing in the world is no substitute for good 
testing. (On the other hand, lousy testing is 
practically worthless.)

-- 
Steven.