On Sep 19, 2019, at 20:25, Tim Peters
Something I haven't seen mentioned here, but I may have missed it: when timing algorithms with sets and dicts, people often end up merely measuring the bad effects of hardware data cache misses when the containers get large, and especially in contrived benchmarks.
Another issue: the container itself is larger. A list’s array uses one pointer (8 bytes) per element. Even without looking at the implementation of set, a hash bucket obviously has to be at least twice that size. Plus, lists need less slack, and put their slack all at the end where you don’t need to traverse, while hash tables have their slack distributed within the table where it does get in the way. So, a list should be able to go at least 2.2x as big as a set before hitting the threshold where it’s no longer all in cache. But of course this is only true if you can forget about the objects themselves. Each string object is a lot more than 16 bytes, and the character strings themselves are as well in most tests, so the top-level difference is often swamped. Still, the nonlinear jumps when you cross boundaries like “everything in cache”, “at least the top level all in cache”, etc. (and at each level of cache) might be visible distortions (or visible reflections of a real benefit, if your application’s behavior is close enough to the artificial benchmark). You can test the behavior of list here by using a huge list of numbers from 1 to 100… but you can’t do that with set, because that would be a tiny set of 100 elements. And of course even once you know the exact characteristics of the cache behavior with your real data, that can be hard to optimize for beyond “try to keep adjacent things adjacent”. There’s a paper from the GHC guys many years ago about how they cleverly tuned the ropes (linked lists of mid-sized arrays) they used to optimize large string internals around the L2 cache and found that the code that worked great on their older Intel CPU was a huge pessimization on the next generation of CPUs because they were fooling the computer into thinking they were never going to read past the end of the current page so it didn’t prefetch the next one.