[Tutor] Binary search question

Ricardo Aráoz ricaraoz at gmail.com
Sat Apr 24 00:32:42 CEST 2010


Hugo Arts wrote:
> On Fri, Apr 23, 2010 at 11:33 PM, Emile van Sebille <emile at fenx.com> wrote:
>   
>> On 4/23/2010 2:21 PM Alan Gauld said...
>>     
>>> "Emile van Sebille" <emile at fenx.com> wrote
>>>       
>>>> It's expensive enough that for a list this size I'd convert it to a
>>>> dict and use in on that. eg,
>>>>
>>>> a = range(100000)
>>>> d = dict(zip(a,a))
>>>>
>>>>         
>>> Surely that would depend on how often you do the search? If its a one
>>> off occurence I'd expect the overhead of zipping and converting to a
>>> dict would outweight the savings?
>>>       
>> Oh sure, but in practical terms, if it's a one-off situation who cares which
>> you us?  For a one-off I'd just use in and not be concerned.
>>
>>     
>
> I guess Alan missed my second e-mail with the micro benchmarks, but
> that nuances my first answer quite a bit, and makes all the points he
> is making. That does not make him less correct, of course. set (I used
> frozenset, but either way) is faster than dict, and in a one-off
> you're best off using bisect.
>
> For completeness sake, on a 10000 item list, using the in operator
> takes *in the worst case* around 7 seconds. bisect, again in the worst
> case, takes only around 0.01 seconds (that's on a Core 2 Duo T5300 @
> 1.7 GHZ, 2 GB RAM). That's quite a savings. those 7 seconds can easily
> be 99% of the execution time of a typical script. So for sufficiently
> large data set it can definitely pay off to be concerned, even with a
> one-off
>
> Of course, profiling will immediately catch that kind of performance
> bottleneck. So even if you care about performance, you can start off
> using 'in' and easily optimize later with bisect or a set, the whole
> "don't do it yet" thing and all.
>
> Hugo
> _______________________________________________
>   

In the same vein of completeness sake, and since this is the *tutor*
list, we should stress that the 'right' approach is to use 'in' whatever
the case. And only if the system is too slow and profiling shows that
'in' is the culprit should we seek for alternatives.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100423/fe8d4f0a/attachment-0001.html>


More information about the Tutor mailing list