[Python-ideas] incremental hashing in __hash__

jab at math.brown.edu jab at math.brown.edu
Fri Dec 30 21:47:54 EST 2016


On Fri, Dec 30, 2016 at 9:21 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> I don't think so.  As someone else said, a hash can be calculated once and
> then cached, but __eq__ has to be called every time.  Depending on the
> clustering of your data that could be quick... or not.
>

__eq__ only has to be called when a hash bucket is non-empty. In that case,
it may be O(n) in pathological cases, but it could also be O(1) every time.
On the other hand, __hash__ has to be called on every lookup, is O(n) on
the first call, and ideally O(1) after. I'd expect that __eq__ may often
not dominate, and avoiding an unnecessary large tuple allocation on the
first __hash__ call could be helpful. If a __hash__ implementation can
avoid creating an arbitrarily large tuple unnecessarily and still perform
well, why not save the space? If the resulting hash distribution is worse,
that's another story, but it seems worth documenting one way or the other,
since the current docs give memory-conscious Python programmers pause for
this use case.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161230/49a34f65/attachment-0001.html>


More information about the Python-ideas mailing list