[Tutor] Binary search question

Hugo Arts hugo.yoshi at gmail.com
Fri Apr 23 23:55:21 CEST 2010

On Fri, Apr 23, 2010 at 11:33 PM, Emile van Sebille <emile at fenx.com> wrote:
> On 4/23/2010 2:21 PM Alan Gauld said...
>> "Emile van Sebille" <emile at fenx.com> wrote
>>> It's expensive enough that for a list this size I'd convert it to a
>>> dict and use in on that. eg,
>>> a = range(100000)
>>> d = dict(zip(a,a))
>> Surely that would depend on how often you do the search? If its a one
>> off occurence I'd expect the overhead of zipping and converting to a
>> dict would outweight the savings?
> Oh sure, but in practical terms, if it's a one-off situation who cares which
> you us?  For a one-off I'd just use in and not be concerned.

I guess Alan missed my second e-mail with the micro benchmarks, but
that nuances my first answer quite a bit, and makes all the points he
is making. That does not make him less correct, of course. set (I used
frozenset, but either way) is faster than dict, and in a one-off
you're best off using bisect.

For completeness sake, on a 10000 item list, using the in operator
takes *in the worst case* around 7 seconds. bisect, again in the worst
case, takes only around 0.01 seconds (that's on a Core 2 Duo T5300 @
1.7 GHZ, 2 GB RAM). That's quite a savings. those 7 seconds can easily
be 99% of the execution time of a typical script. So for sufficiently
large data set it can definitely pay off to be concerned, even with a

Of course, profiling will immediately catch that kind of performance
bottleneck. So even if you care about performance, you can start off
using 'in' and easily optimize later with bisect or a set, the whole
"don't do it yet" thing and all.


More information about the Tutor mailing list