[Python-ideas] incremental hashing in __hash__
Ethan Furman
ethan at stoneleaf.us
Fri Dec 30 21:21:36 EST 2016
On 12/30/2016 06:12 PM, jab at math.brown.edu wrote:
> ... your point stands that Python had to call __eq__ in these cases.
>
> But with instances of large, immutable, ordered collections, an
> application could either:
>
> 1. accept that it might create a duplicate, equivalent instance of
> an existing instance with sufficient infrequency that it's okay
> taking the performance hit, or
>
> 2. try to avoid creating duplicate instances in the first place,
> using the existing, equivalent instance it created as a singleton.
> Then a set or dict lookup could use the identity check, and avoid
> the expensive __eq__ call.
> I think it's much more important to focus on what happens with
> unequal instances here, since there are usually many more of them.
> And with them, the performance hit of the __eq__ calls definitely
> does not necessarily dominate that of __hash__. Right?
I don't think so. As someone else said, a hash can be calculated once and then cached, but __eq__ has to be called every time. Depending on the clustering of your data that could be quick... or not.
--
~Ethan~
More information about the Python-ideas
mailing list