finding items that occur more than once in a list

Bryan Olson fakeaddress at nowhere.org
Sun Mar 23 02:02:52 EDT 2008


Arnaud Delobelle wrote:
> Bryan Olson wrote:
>> We mean that the party supplying the keys deliberately chose
>> them to make the hash table inefficient. In this thread the goal
>> is efficiency; a party working toward an opposing goal is an
>> adversary.
> 
> There are situations where this can happen I guess

Cormen-Leiserson-Rivest /Introduction to Algorithms/ discusses
it in section 12.3.3 (first edition). They suggest a family of
hash functions, where an individual hash is chosen at random
when creating the table. That still doesn't beat adversaries
who can get on-line timing information and figure out when
they induce bad cases.

More generally, security researchers have started to look at
"algorithmic complexity denial of service attacks".
http://www.cs.rice.edu/~scrosby/hash/


>> If you find real-world data sets that tend to induce bad-case
>> behavior in Python's hash, please do tell. It would be reason
>> enough to adjust the hash function. The hashes in popular
>> software such as Python are already quite well vetted.
> 
> As Python allows you to make your own hash functions for your own
> classes, another danger is that you are dealing with objects with a
> bad hash function.  I've actually tried it (as I am *not* a pro!) and
> FWIW here are the results.
> 
> -------- badhash.py ----------
> 
> class BadHash(object):
>     def __hash__(self):
>         return 1 # that's a bad hash function!

The sorting algorithm can have the same problem. With user-defined
methods, there's no telling how long __cmp__ will actually take.


-- 
--Bryan



More information about the Python-list mailing list