On Mon, Jul 27, 2020 at 5:42 PM Ethan Furman <ethan@stoneleaf.us> wrote:
> Chris Barker wrote:
>  Is this because it's possible, if very
> unlikely, for ANY hash algorithm to create the same hash for two
> different inputs? So equality always has to be checked anyway?


For example, if a hash algorithm decided to use short names, then a
group of people might be sorted like this:

Bob: Bob, Robert
Chris: Christopher, Christine, Christian, Christina
Ed: Edmund, Edward, Edwin, Edwina

So if somebody draws a name from a hat:


You apply the hash to it:


Ignore the Bob and Ed buckets, then use equality checks on the Chris
names to find the right one.

sure, but know (or assume anyway) that python dicts and sets don't use such a simple, naive hash algorithm, so in fact, non-equal strings are very unlikely to have the same hash:

In [42]: hash("Christina")                                                      
Out[42]: -8424898463413304204

In [43]: hash("Christopher")                                                    
Out[43]: 4404166401429815751

In [44]: hash("Christian")                                                      
Out[44]: 1032502133450913307

But a dict always has a LOT fewer buckets than possible hash values, so clashes within a bucket are not so rare, so equality needs to be checked always -- which is what I was missing.

And while it wouldn't break anything, having a bunch of non-equal objects produce the same hash wouldn't break anything, it would break the O(1) performance of dicts.

Have I got that right?


>> From a practical standpoint, think of dictionaries:
> (that's the trick here -- you can't "get" this without knowing something
> about the implementation details of dicts.)

Depends on the person -- I always do better with a concrete application.

>> adding
>> ------
>> - objects are sorted into buckets based on their hash
>> - any one bucket can have several items with equal hashes
> is this mostly because there are many more possible hashes than buckets?


>> - those several items (obviously) will not compare equal
> So the hash is a fast way to put stuff in buckets, so you only need to
> compare with the others that end up in the same bucket?


>> retrieving
>> ----------
>> - get the hash of the object
>> - find the bucket that would hold that hash
>> - find the already stored objects with the same hash
>> - use __eq__ on each one to find the match
> So here's my question: if there is only one object in that bucket, is
> __eq__ checked anyway?

Yes -- just because it has the same hash does not mean it's equal.

> So what happens when there is no __eq__?The object can still be hashable
> -- I guess that's because there IS an __eq__ -- it defaults to an id
> check, yes?


The default hash, I believe, also defaults to the object id -- so, by
default, objects are hashable and compare equal only to themselves.

Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XPUXOSK7WHXV7LRB7H3I4S42JQ2WXQU3/
Code of Conduct: http://python.org/psf/codeofconduct/

Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython