[Python-3000] Performance Notes - new hash algorithm

Sun Sep 9 11:13:00 CEST 2007

On 9/9/07, Larry Hastings <larry at hastings.org> wrote:

> One goal of Jenkin's hashes is uniform distribution, so these functions
> presumably lack the serendipitous "similar inputs hash to similar values"
> behavior of Python's current hash function.  But why is that a feature?
> (Not that I doubt Tim Peters!)
>

Because (relatively) small dicts with (broadly speaking) similar keys are
quite common in Python. Module and class and instance __dict__s, for
instance ;) As Tim mentioned, the dict implementation only looks at part of
the actual hash value (depending on the size of the dict) and having hash
values close but not the same greatly decreases the chance of collisions in
(relatively) small dicts. It's less of a problem for massive dicts with
(almost) completely arbitrary keys, but it doesn't exactly hurt there,
either.

-- 
Thomas Wouters <thomas at python.org>

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070909/abe62bce/attachment.htm