[Python-Dev] Dictionary tuning
Raymond Hettinger
Raymond Hettinger" <python@rcn.com
Mon, 28 Apr 2003 15:51:36 -0400
I've experimented with about a dozen ways to improve dictionary
performance and found one that benefits some programs by up to
5% without hurting the performance of other programs by more
than a single percentage point.
It entails a one line change to dictobject.c resulting in a new
schedule of dictionary sizes for a given number of entries:
Number of Current size Proposed size
Filled Entries of dictionary of dictionary
-------------- ------------- -------------
[-- 0 to 5 --] 8 8
[-- 6 to 10 --] 16 32
[-- 11 to 21 --] 32 32
[-- 22 to 42 --] 64 128
[-- 43 to 85 --] 128 128
[-- 86 to 170 --] 256 512
[-- 171 to 341 --] 512 512
The idea is to lower the average sparseness of dictionaries (by
0% to 50% of their current sparsenes). This results in fewer
collisions, faster collision resolution, fewer memory accesses,
and better cache performance. A small side-benefit is halving
the number of resize operations as the dictionary grows.
The above table of dictionary sizes shows that odd numbered
steps have the same size as the current approach while even
numbered steps are twice as large. As a result, small dicts
keep their current size and the amortized size of large dicts
remains the same. Along the way, some dicts will be a little
larger and will benefit from the increased sparseness.
I would like to know what you guys think about the idea
and would appreciate your verifying the performance on
your various processors and operating systems.
Raymond Hettinger
P.S. The one line patch is:
*** dictobject.c 17 Mar 2003 19:46:09 -0000 2.143
--- dictobject.c 25 Apr 2003 22:33:24 -0000
***************
*** 532,538 ****
* deleted).
*/
if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
! if (dictresize(mp, mp->ma_used*2) != 0)
return -1;
}
return 0;
--- 532,538 ----
* deleted).
*/
if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
! if (dictresize(mp, mp->ma_used*4) != 0)
return -1;
}