
On 2017-06-26 19:21, Tim Peters wrote:
Some explanations and cautions. An advantage of sticking with pure Python instead of C is that spare moments are devoted to investigating the problem instead of thrashing with micro-optimizations ;-)
Why does the current scheme suffer for small tables? With hindsight it's pretty obvious: it can visit table slots more than once (the other schemes cannot), and while the odds of that happening are tiny in large tables, they're high for small tables.
[snip]
So where does that leave us? I don't know. All schemes have good & bad points ;-) I haven't yet thought of a cheap way to compute an `inc` for double-hashing that isn't vulnerable to bad behavior for _some_ easily constructed set of int keys. If you forget "cheap", it's easy; e.g.,
random.seed(h) inc = random.choice(range(1, mask + 1, 2))
Regardless, I'll attach the current version of the code.
If the current scheme suffers only for small tables, couldn't you use an alternative scheme only for small tables?