[Python-ideas] incremental hashing in __hash__

jab at math.brown.edu jab at math.brown.edu
Fri Dec 30 21:12:02 EST 2016


On Fri, Dec 30, 2016 at 8:10 PM, <jab at math.brown.edu> wrote:

> On Fri, Dec 30, 2016 at 8:04 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
>> No.  It is possible to have two keys be equal but different -- an easy
>> example is 1 and 1.0; they both hash the same, equal the same, but are not
>> identical.  dict has to check equality when two different objects hash the
>> same but have non-matching identities.
>>
> ...
> I'm having trouble showing that two equal but nonidentical objects can
> both be in the same dict.
>

(Because they can't, though) your point stands that Python had to call
__eq__ in these cases.

But with instances of large, immutable, ordered collections, an application
could either:

1. accept that it might create a duplicate, equivalent instance of an
existing instance with sufficient infrequency that it's okay taking the
performance hit, or

2. try to avoid creating duplicate instances in the first place, using the
existing, equivalent instance it created as a singleton. Then a set or dict
lookup could use the identity check, and avoid the expensive __eq__ call.


I think it's much more important to focus on what happens with unequal
instances here, since there are usually many more of them. And with them,
the performance hit of the __eq__ calls definitely does not necessarily
dominate that of __hash__. Right?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161230/81240de9/attachment.html>


More information about the Python-ideas mailing list