On 2020-07-28 at 15:58:58 -0700, Christopher Barker <pythonchb@gmail.com> wrote:
But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing.
And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts.
Have I got that right?
Yes. Breaking O(1) performance was actually the root of possible Denial of Service attacks: if an attacker knows the algorithms, that attacker could specifically create keys (e.g., user names) whose hash values are the same, and then searching a dict degenerates to O(N), and then your server falls to its knees. At some point, Python added some randomization to the way dictionaries work in order to foil suck attacks.