[Python-ideas] incremental hashing in __hash__
Nathaniel Smith
njs at pobox.com
Fri Dec 30 22:30:58 EST 2016
On Fri, Dec 30, 2016 at 9:29 AM, <jab at math.brown.edu> wrote:
> Updating the docs sounds like the more important change for now, given 3.7+.
> But before the docs make an official recommendation for that recipe, were
> the analyses that Steve and I did sufficient to confirm that its hash
> distribution and performance is good enough at scale, or is more rigorous
> analysis necessary?
>
> I've been trying to find a reasonably detailed and up-to-date reference on
> Python hash() result requirements and analysis methodology, with
> instructions on how to confirm if they're met, but am still looking. Would
> find that an interesting read if it's out there. But I'd take just an
> authoritative thumbs up here too. Just haven't heard one yet.
I'm not an expert on hash table implementation subtleties, but I can
quote Objects/dictobject.c: "Most hash schemes depend on having a
"good" hash function, in the sense of simulating randomness. Python
doesn't ..."
https://github.com/python/cpython/blob/d0a2f68a5f972e4474f5ecce182752f6f7a22178/Objects/dictobject.c#L133
(Basically IIUC the idea is that Python dicts use a relatively
sophisticated probe sequence to resolve collisions which allows the
use of relatively "poor" hashes. There are interesting trade-offs
here...)
-n
--
Nathaniel J. Smith -- https://vorpus.org
More information about the Python-ideas
mailing list