[Python-Dev] Dictionary tuning

Mon, 28 Apr 2003 15:51:36 -0400

I've experimented with about a dozen ways to improve dictionary 
performance and found one that benefits some programs by up to 
5% without hurting the performance of other programs by more
than a single percentage point.

It entails a one line change to dictobject.c resulting in a new 
schedule of dictionary sizes for a given number of entries:

Number of           Current size        Proposed size
Filled Entries      of dictionary       of dictionary
--------------      -------------       -------------
[-- 0 to 5 --]            8                   8
[-- 6 to 10 --]          16                  32
[-- 11 to 21 --]         32                  32
[-- 22 to 42 --]         64                 128
[-- 43 to 85 --]        128                 128
[-- 86 to 170 --]       256                 512
[-- 171 to 341 --]      512                 512

The idea is to lower the average sparseness of dictionaries (by
0% to 50% of their current sparsenes).  This results in fewer 
collisions, faster collision resolution, fewer memory accesses,
and better cache performance.  A small side-benefit is halving
the number of resize operations as the dictionary grows.

The above table of dictionary sizes shows that odd numbered
steps have the same size as the current approach while even
numbered steps are twice as large.  As a result, small dicts 
keep their current size and the amortized size of large dicts 
remains the same.  Along the way, some dicts will be a little
larger and will benefit from the increased sparseness.

I would like to know what you guys think about the idea
and would appreciate your verifying the performance on
your various processors and operating systems.

Raymond Hettinger

P.S.  The one line patch is:

*** dictobject.c        17 Mar 2003 19:46:09 -0000      2.143
--- dictobject.c        25 Apr 2003 22:33:24 -0000
***************
*** 532,538 ****
         * deleted).
         */
        if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
!               if (dictresize(mp, mp->ma_used*2) != 0)
                        return -1;
        }
        return 0;
--- 532,538 ----
         * deleted).
         */
        if (mp->ma_used > n_used && mp->ma_fill*3 >= (mp->ma_mask+1)*2) {
!               if (dictresize(mp, mp->ma_used*4) != 0)
                        return -1;
        }