> Chris Barker wrote:
> Is this because it's possible, if very
> unlikely, for ANY hash algorithm to create the same hash for two
> different inputs? So equality always has to be checked anyway?
snip
For example, if a hash algorithm decided to use short names, then a
group of people might be sorted like this:
Bob: Bob, Robert
Chris: Christopher, Christine, Christian, Christina
Ed: Edmund, Edward, Edwin, Edwina
So if somebody draws a name from a hat:
Christina
You apply the hash to it:
Chris
Ignore the Bob and Ed buckets, then use equality checks on the Chris
names to find the right one.
sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash:
In [42]: hash("Christina")
Out[42]: -8424898463413304204
In [43]: hash("Christopher")
Out[43]: 4404166401429815751
In [44]: hash("Christian")
Out[44]: 1032502133450913307
But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing.
And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts.
Have I got that right?
-CHB