I've experimented with about a dozen ways to improve dictionary performance and found one that benefits some programs by up to 5% without hurting the performance of other programs by more than a single percentage point.
It entails a one line change to dictobject.c resulting in a new schedule of dictionary sizes for a given number of entries:
Number of Current size Proposed size Filled Entries of dictionary of dictionary -------------- ------------- ------------- [-- 0 to 5 --] 8 8 [-- 6 to 10 --] 16 32 [-- 11 to 21 --] 32 32 [-- 22 to 42 --] 64 128 [-- 43 to 85 --] 128 128 [-- 86 to 170 --] 256 512 [-- 171 to 341 --] 512 512
I suppose there's an "and so on" here, right? I wonder if for *really* large dicts the space sacrifice isn't worth the time saved?
The idea is to lower the average sparseness of dictionaries (by 0% to 50% of their current sparsenes). This results in fewer collisions, faster collision resolution, fewer memory accesses, and better cache performance. A small side-benefit is halving the number of resize operations as the dictionary grows.
I think you mean "raise the average sparseness" don't you? (The more sparse something is, the more gaps it has.) I tried the patch with my new favorite benchmark, startup time for Zope (which surely populates a lot of dicts :-). It did give about 0.13 seconds speedup on a total around 3.5 seconds, or almost 4% speedup. --Guido van Rossum (home page: http://www.python.org/~guido/)