[Python-ideas] incremental hashing in __hash__
Ethan Furman
ethan at stoneleaf.us
Fri Dec 30 22:08:27 EST 2016
On 12/30/2016 06:47 PM, jab at math.brown.edu wrote:
> __eq__ only has to be called when a hash bucket is non-empty. In that case,
> it may be O(n) in pathological cases, but it could also be O(1) every time.
> On the other hand, __hash__ has to be called on every lookup, is O(n) on
> the first call, and ideally O(1) after. I'd expect that __eq__ may often
> not dominate, and avoiding an unnecessary large tuple allocation on the
> first __hash__ call could be helpful. If a __hash__ implementation can
> avoid creating an arbitrarily large tuple unnecessarily and still perform
> well, why not save the space? If the resulting hash distribution is worse,
> that's another story, but it seems worth documenting one way or the other,
> since the current docs give memory-conscious Python programmers pause for
> this use case.
A quick list of what we have agreed on:
- __hash__ is called once for every dict/set lookup
- __hash__ is calculated once because we are caching the value
- __eq__ is being called an unknown number of times, but can be quite
expensive in terms of time
- items with the same hash are probably cheap in terms of __eq__ (?)
So maybe this will work?
def __hash__(self):
return hash(self.name) * hash(self.nick) * hash(self.color)
In other words, don't create a new tuple, just use the ones you already have and toss in a couple maths operations. (and have somebody vet that ;)
--
~Ethan~
More information about the Python-ideas
mailing list