[Tutor] binary vs. linear search

Tim Peters tim.one@comcast.net
Sat, 24 Aug 2002 01:36:12 -0400


[Tim Wilson]
> Thanks Sean. I removed the wordList.sort() from the binary search
> function and the time for execution dropped to just over 1 sec.!!!!
> Yikes! Now that I think about it, the time to sort a 6,175 item list
> 6,175 times is signficant.

The real mystery there should have been why the aggregate sorting time
didn't take minutes instead of seconds.  There's a reason for that:
list.sort() first checks to see whether the list is already sorted, and gets
out with only len(list)-1 compares if it is.  That's why it didn't kill you;
it only needed to do a full-blown sort the first time you called it.

> Duh. As long as I make sure I pass a sorted list to the function, the
> performance gain over the regular linear search is impressive to say
> the least.

It should go very much faster to make the words the keys of a Python dict,
e.g.

    words = {}
    for w in wordList:
        words[w] = 1

Then to find out later whether a word w was in the original wordList, you
just need to see whether words.has_key(w) is true.  On average, doing
has_key takes a small and constant time independent of how many words there
are.  Binary search takes time proportional to log_base_2(number_of_words),
which is much slower as the number of words grows despite how efficient
binary search is compared to brute-force searching.