finding items that occur more than once in a list

Bryan Olson fakeaddress at
Sat Mar 22 09:31:45 CET 2008

castironpi at wrote:
> John Machin wrote:
>> On Mar 22, 1:11 am, castiro... at wrote:
>>> A collision sequence is not so rare.
>>>>>> [ hash( 2**i ) for i in range( 0, 256, 32 ) ]
>>> [1, 1, 1, 1, 1, 1, 1, 1]

>> Bryan did qualify his remarks: "If we exclude the case where an
>> adversary is choosing the keys, ..."
> Some adversary.  What, you mean, my boss or my customers?

We mean that the party supplying the keys deliberately chose
them to make the hash table inefficient. In this thread the goal
is efficiency; a party working toward an opposing goal is an

If you find real-world data sets that tend to induce bad-case
behavior in Python's hash, please do tell. It would be reason
enough to adjust the hash function. The hashes in popular
software such as Python are already quite well vetted.

Even a hash function that behaves as a random oracle has
worst-case quadratic-time in the algorithm here, but that's
an issue in theory and not in serving customers.


More information about the Python-list mailing list