[Python-ideas] incremental hashing in __hash__

jab at math.brown.edu jab at math.brown.edu
Fri Dec 30 12:29:55 EST 2016


Updating the docs sounds like the more important change for now, given
3.7+. But before the docs make an official recommendation for that recipe,
were the analyses that Steve and I did sufficient to confirm that its hash
distribution and performance is good enough at scale, or is more rigorous
analysis necessary?

I've been trying to find a reasonably detailed and up-to-date reference on
Python hash() result requirements and analysis methodology, with
instructions on how to confirm if they're met, but am still looking. Would
find that an interesting read if it's out there. But I'd take just an
authoritative thumbs up here too. Just haven't heard one yet.

And regarding any built-in support that might get added, I just want to
make sure Ryan Gonzalez's proposal (the first reply on this thread) didn't
get buried:

hasher = IncrementalHasher()
hasher.add(one_item_to_hash)  # updates hasher.hash property with result
# repeat
return hasher.hash

I think this is the only proposal so far that actually adds an explicit API
for performing an incremental update. (i.e. The other
"hash_stream(iterable)" -style proposals are all-or-nothing.) This would
bring Python's built-in hash() algorithm's support up to parity with the
other algorithms in the standard library (hashlib, crc32). Maybe that's
valuable?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20161230/1c060742/attachment.html>


More information about the Python-ideas mailing list