[Python-Dev] Dictionary tuning upto 100,000 entries

Raymond Hettinger python@rcn.com
Tue, 29 Apr 2003 04:12:52 -0400


[Raymond]
> >>I've experimented with about a dozen ways to improve dictionary 
> >>performance and found one that benefits some programs by up to 
> >>5% without hurting the performance of other programs by more
> >>than a single percentage point.
> >>
> >>It entails a one line change to dictobject.c resulting in a new 
> >>schedule of dictionary sizes for a given number of entries:

[Mark Lemburg]
> Perhaps you could share that change ? Or is it on SF somewhere ?

It was in the original post.  But SF is better, so
I just loaded it to the patch manager:
    www.python.org/sf/729395



[GvR]
> > I suppose there's an "and so on" here, right?  I wonder if for
> > *really* large dicts the space sacrifice isn't worth the time saved?

Due to the concerns raised about massive dictionaries,
I revised the patch to switch back to the old growth 
schedule for sizes above 100,000 entries (approx 1.2 Mb).



[Mark Lemburg]
> I believe that the reason for the speedups you see is
> that cache sizes and processor optimizations have changes
> since the times the current resizing implementation was chosen,
> so maybe we ought to rethink the parameters:
> 
> * minimum table size
> * first three resize steps

I've done dozens of experiements with changing these parameters
and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7)
but found that what helped some applications would hurt others.
The current tuning remains fairly effective.  Changing the resize
step from *2 to *4 was the only alteration that yielded across 
the board improvements.



Raymond Hettinger