[Tutor] "bisect" vs "in"

Albert-Jan Roskam fomcl at yahoo.com
Thu Sep 20 13:58:31 CEST 2012


Hi,
 
I want to repeatedly search a list to test whether that list contains a given element X.
Of course, I can use membership testing using "in" (list._contains__).
But how is this method implemented? It seems to just walk through the whole list from 
beginning to end (see timeit code below). Is it faster to do a binary search on a sorted list,
given that this task has to be done repeatedly? Can such a program be scaled to lists
that do not fit into memory (maybe using shelve?)?
 
This is the function I made:
 
import bisect
def binarySearch(haystack, needle):
 return haystack[bisect.bisect_right(haystack, needle) - 1] == needle
 
# needle is at the beginning of the sequence
>>> t = timeit.Timer("""import bisect
def binarySearch(haystack, needle):
 return haystack[bisect.bisect_right(haystack, needle) - 1] == needle
binarySearch(range(10**7), 0)""")
>>> t.timeit(100)
16.737915573068676
>>> t2 = timeit.Timer("0 in range(10**7)")
>>> t2.timeit(100)
16.304621956142682
>>> 
# needle is at the end of the sequence
>>> t = timeit.Timer("""import bisect
def binarySearch(haystack, needle):
 return haystack[bisect.bisect_right(haystack, needle) - 1] == needle
binarySearch(range(10**7), 10**7-1)""")
>>> t.timeit(100)
16.954997632380582
>>> t2 = timeit.Timer("10**7-1 in range(10**7)")
>>> t2.timeit(100)
36.3686376341127

Thank you in advance!

Regards,
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a 
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 


More information about the Tutor mailing list