[Python-ideas] incremental hashing in __hash__

Ethan Furman ethan at stoneleaf.us
Fri Dec 30 21:21:36 EST 2016


On 12/30/2016 06:12 PM, jab at math.brown.edu wrote:

> ... your point stands that Python had to call __eq__ in these cases.
>
> But with instances of large, immutable, ordered collections, an
>  application could either:
>
> 1. accept that it might create a duplicate, equivalent instance of
>  an existing instance with sufficient infrequency that it's okay
>  taking the performance hit, or
>
> 2. try to avoid creating duplicate instances in the first place,
>  using the existing, equivalent instance it created as a singleton.
>  Then a set or dict lookup could use the identity check, and avoid
>  the expensive __eq__ call.

> I think it's much more important to focus on what happens with
>  unequal instances here, since there are usually many more of them.
>  And with them, the performance hit of the __eq__ calls definitely
>  does not necessarily dominate that of __hash__.  Right?

I don't think so.  As someone else said, a hash can be calculated once and then cached, but __eq__ has to be called every time.  Depending on the clustering of your data that could be quick... or not.

--
~Ethan~


More information about the Python-ideas mailing list