[Python-ideas] incremental hashing in __hash__

Ethan Furman ethan at stoneleaf.us
Fri Dec 30 22:08:27 EST 2016


On 12/30/2016 06:47 PM, jab at math.brown.edu wrote:

> __eq__ only has to be called when a hash bucket is non-empty. In that case,
>  it may be O(n) in pathological cases, but it could also be O(1) every time.
>  On the other hand, __hash__ has to be called on every lookup, is O(n) on
>  the first call, and ideally O(1) after. I'd expect that __eq__ may often
>  not dominate, and avoiding an unnecessary large tuple allocation on the
>  first __hash__ call could be helpful. If a __hash__ implementation can
>  avoid creating an arbitrarily large tuple unnecessarily and still perform
>  well, why not save the space? If the resulting hash distribution is worse,
>  that's another story, but it seems worth documenting one way or the other,
>  since the current docs give memory-conscious Python programmers pause for
>  this use case.

A quick list of what we have agreed on:

- __hash__ is called once for every dict/set lookup
- __hash__ is calculated once because we are caching the value
- __eq__ is being called an unknown number of times, but can be quite
   expensive in terms of time
- items with the same hash are probably cheap in terms of __eq__ (?)

So maybe this will work?

     def __hash__(self):
         return hash(self.name) * hash(self.nick) * hash(self.color)

In other words, don't create a new tuple, just use the ones you already have and toss in a couple maths operations.  (and have somebody vet that ;)

--
~Ethan~



More information about the Python-ideas mailing list