[issue5186] Reduce hash collisions for objects with no __hash__ method
Adam Olsen
report at bugs.python.org
Wed Feb 11 04:31:53 CET 2009
Adam Olsen <rhamph at gmail.com> added the comment:
On my 64-bit linux box there's nothing in the last 4 bits:
>>> [id(o)%16 for o in [object() for i in range(128)]]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]
And with a bit more complicated functions I can determine how much shift
gives us the lowest collision rate:
def a(size, shift):
return len(set((id(o) >> shift) % (size * 2) for o in [object() for
i in range(size)]))
def b(size):
return [a(size, shift) for shift in range(11)]
def c():
for i in range(1, 9):
size = 2**i
x = ', '.join('% 3s' % count for count in b(size))
print('% 3s: %s' % (size, x))
>>> c()
2: 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2
4: 1, 1, 2, 3, 4, 3, 2, 4, 4, 3, 2
8: 1, 2, 4, 6, 6, 7, 8, 6, 4, 3, 2
16: 2, 4, 7, 9, 12, 13, 12, 8, 5, 3, 2
32: 4, 8, 14, 23, 30, 25, 19, 12, 7, 4, 2
64: 8, 16, 32, 55, 64, 38, 22, 13, 8, 4, 2
128: 16, 32, 64, 114, 128, 71, 39, 22, 12, 6, 3
256: 32, 64, 128, 242, 242, 123, 71, 38, 20, 10, 5
The fifth column (ie 4 bits of shift, a divide of 16) works the best.
Although it varies from run to run, probably more than half the results
in that column have no collisions at all.
.. although, if I replace object() with list() I get best results with a
shift of 6 bits. Replacing it with dict() is best with 8 bits.
We may want something more complicated.
----------
nosy: +Rhamphoryncus
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5186>
_______________________________________
More information about the Python-bugs-list
mailing list