[M.-A. Lemburg]
... Once upon a time, when I was playing with inlining dictionary tables (now part of the dictionary implementation thanks to Tim),
Thank you!
... I don't think that large dictionaries should become more sparse -- that's just a waste of memory.
Collision resolution is very fast if the dict slots happen to live in cache. When they're out of cache, the apparent speed of the C code is irrelevant, the time is virtually all consumed by the HW (or even OS) resolving the cache misses, and every collision probe is very likely to be a cache miss then (the probe sequence-- by design --jumps all over the slots in (pseudo-)random order). So when Raymond explained that increasing sparseness helped *most* for large dicts, it made great sense to me. We can likely resolve dozens of collisions in a small dict in the time it takes for one extra probe in a large dict. Jeremy had a possibly happy idea wrt this: make the collision probe sequence start in a slot adjacent to the colliding slot. That's likely to get sucked into cache "for free", tagging along with the slot that collided. If that's effective, it could buy much of the "large dict" speed gains Raymond saw without increasing the dict size. If someone wants to experiment with that in lookdict_string(), stick a new ++i; before the for loop, and move the existing i = (i << 2) + i + perturb + 1; to the bottom of that loop. Likewise for lookdict().