[Python-Dev] http://mail.python.org/pipermail/python-dev/2011-December/115172.html
Jim Jewett
jimjjewett at gmail.com
Mon Jan 2 04:28:13 CET 2012
On Sun, Jan 1, 2012 at 10:00 PM, PJ Eby <pje at telecommunity.com> wrote:
> On Sun, Jan 1, 2012 at 7:37 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
>> Well, there is nothing wrong with switching to a different hash function
>> after N
>> collisions, rather than "in the first place". The perturbation
>> effectively does by
>> shoving the high-order bits through the part of the hash that survives the
>> mask.
> Since these are true hash collisions, they will all have the same high order
> bits. So, the usefulness of the perturbation is limited mainly to the
> common case where true collisions are rare.
That is only because the perturb is based solely on the hash.
Switching to an entirely new hash after the 5th collision (for a given
lookup) would resolve that (after the 5th collision); the question is
whether or not the cost is worthwhile.
>> > (Well, technically, you could use trees or some other O log n data
>> > structure as a fallback once you have too many collisions, for some
>> > value
>> > of "too many". Seems a bit wasteful for the purpose, though.)
>>
>> Your WSGI specification < http://www.python.org/dev/peps/pep-0333/ >
>> requires
>> using a real dictionary for compatibility; storing some of the values
>> outside the
>> values array would violate that.
> When I said "use some other data structure", I was referring to the internal
> implementation of the dict type, not to user code. The only user-visible
> difference (even at C API level) would be the order of keys() et al.
Given the wording requiring a real dictionary, I would have assumed
that it was OK (if perhaps not sensible) to do pointer arithmetic and
access the keys/values/hashes directly. (Though if the breakage was
between python versions, I would feel guilty about griping too
loudly.)
-jJ
More information about the Python-Dev
mailing list