hash and __eq__

Aahz aahz at pythoncraft.com
Tue Jun 2 11:27:02 EDT 2009

In article <003e1491$0$9723$c3e8da3 at news.astraweb.com>,
Steven D'Aprano  <steve at REMOVE-THIS-cybersource.com.au> wrote:
>On Sun, 31 May 2009 07:24:09 +0100, Arnaud Delobelle wrote:
>> AFAIK, 'complexity' means 'worst case complexity' unless otherwise
>> stated.
>No, it means "average or typical case" unless otherwise specified. 
>Consult almost any comp sci text book and you'll see hash tables with 
>chaining (like Python dicts) described as O(1) rather than O(N), 
>Quicksort as O(N log N) instead of O(N**2), and similar. If the default 
>was "worst case unless otherwise specified", then Quicksort would be 
>called "Slower than Bubblesort Sort".
>(Both are O(N**2), but Quicksort does more work.)
>Here's a quote on-line:
>"You should be clear about which cases big-oh notation describes. By 
>default it usually refers to the average case, using random data. 
>However, the characteristics for best, worst, and average cases can be 
>very different..."

When talking about big-oh, I prefer to define "worst-case" as "real world
likely worst-case" -- for example, feeding an already-sorted list to
Quicksort.  However, the kind of worst-case that causes a dict to have
O(N) behavior I would call "pathological case".  I generally try to
define big-oh in terms of what I call worst-case, because I think it's
important to keep track of where your algorithm is likely to run into
problems, but I don't think it's worth the effort in most cases to worry
about pathological cases.

In the case of dicts, it's important to remember that pathological cases
are far more likely to arise from poorly designed hashable user classes
than from data problems per se; Python's hashing of built-in types has
almost twenty years of tuning.
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

    on-a-new-machine-ly y'rs  - tim

More information about the Python-list mailing list