[Python-3000] Performance Notes - new hash algorithm
Nicko van Someren
nicko at nicko.org
Thu Sep 13 01:33:07 CEST 2007
On 10 Sep 2007, at 01:58, Jim Jewett wrote:
> To spell this out a bit more:
> ...
> When adding four entries to an 8-slot table, a truly random hash would
> have at least one collision (0/8 + 1/8 + 2/8 + 3/8 =) 3/4 of the
> time. As expected, the proposed hash does have a collision for those
> four values (the first and fourth).
While your over-all analysis is both informative and helpful, the
pedant in me feels obliged to point out the flaw in your math. The
probability of at least one collision is 1 minus the probability of
no collision, which is in turn 8/8 * 7/8 * 6/8 * 5/8, so the correct
figure is actually that you collide about 59% of the time, not 75%.
(If your math were correct then 5 items would collide 125% of the
time, which is clearly wrong! :-)
Cheers,
Nicko
More information about the Python-3000
mailing list