[Tutor] Binary search question

Fri Apr 23 23:55:21 CEST 2010

On Fri, Apr 23, 2010 at 11:33 PM, Emile van Sebille <emile at fenx.com> wrote:
> On 4/23/2010 2:21 PM Alan Gauld said...
>>
>> "Emile van Sebille" <emile at fenx.com> wrote
>>>
>>> It's expensive enough that for a list this size I'd convert it to a
>>> dict and use in on that. eg,
>>>
>>> a = range(100000)
>>> d = dict(zip(a,a))
>>>
>>
>> Surely that would depend on how often you do the search? If its a one
>> off occurence I'd expect the overhead of zipping and converting to a
>> dict would outweight the savings?
>
> Oh sure, but in practical terms, if it's a one-off situation who cares which
> you us?  For a one-off I'd just use in and not be concerned.
>

I guess Alan missed my second e-mail with the micro benchmarks, but
that nuances my first answer quite a bit, and makes all the points he
is making. That does not make him less correct, of course. set (I used
frozenset, but either way) is faster than dict, and in a one-off
you're best off using bisect.

For completeness sake, on a 10000 item list, using the in operator
takes *in the worst case* around 7 seconds. bisect, again in the worst
case, takes only around 0.01 seconds (that's on a Core 2 Duo T5300 @
1.7 GHZ, 2 GB RAM). That's quite a savings. those 7 seconds can easily
be 99% of the execution time of a typical script. So for sufficiently
large data set it can definitely pay off to be concerned, even with a
one-off

Of course, profiling will immediately catch that kind of performance
bottleneck. So even if you care about performance, you can start off
using 'in' and easily optimize later with bisect or a set, the whole
"don't do it yet" thing and all.

Hugo