[Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html

Jim Jewett jimjjewett at gmail.com
Mon Jan 2 01:37:26 CET 2012

In http://mail.python.org/pipermail/python-dev/2011-December/115172.html,
P. J. Eby wrote:

> On Sat, Dec 31, 2011 at 7:03 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:

>> While the dictionary probe has to start with a hash for backward
>> compatibility reasons, is there a reason the overflow strategy for
>> insertion has to be buckets containing lists?  How about
>> double-hashing, etc?

> This won't help, because the keys still have the same hash value. ANYTHING
> you do to them after they're generated will result in them still colliding.

> The *only* thing that works is to change the hash function in such a way
> that the strings end up with different hashes in the first place.
> Otherwise, you'll still end up with (deliberate) collisions.

Well, there is nothing wrong with switching to a different hash function after N
collisions, rather than "in the first place".  The perturbation
effectively does by
shoving the high-order bits through the part of the hash that survives the mask.

> (Well, technically, you could use trees or some other O log n data
> structure as a fallback once you have too many collisions, for some value
> of "too many".  Seems a bit wasteful for the purpose, though.)

Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ > requires
using a real dictionary for compatibility; storing some of the values
outside the
values array would violate that.  Do you consider that obsolete?


More information about the Python-Dev mailing list