[Python-Dev] Dictionary tuning upto 100,000 entries
Raymond Hettinger
python@rcn.com
Tue, 29 Apr 2003 04:12:52 -0400
[Raymond]
> >>I've experimented with about a dozen ways to improve dictionary
> >>performance and found one that benefits some programs by up to
> >>5% without hurting the performance of other programs by more
> >>than a single percentage point.
> >>
> >>It entails a one line change to dictobject.c resulting in a new
> >>schedule of dictionary sizes for a given number of entries:
[Mark Lemburg]
> Perhaps you could share that change ? Or is it on SF somewhere ?
It was in the original post. But SF is better, so
I just loaded it to the patch manager:
www.python.org/sf/729395
[GvR]
> > I suppose there's an "and so on" here, right? I wonder if for
> > *really* large dicts the space sacrifice isn't worth the time saved?
Due to the concerns raised about massive dictionaries,
I revised the patch to switch back to the old growth
schedule for sizes above 100,000 entries (approx 1.2 Mb).
[Mark Lemburg]
> I believe that the reason for the speedups you see is
> that cache sizes and processor optimizations have changes
> since the times the current resizing implementation was chosen,
> so maybe we ought to rethink the parameters:
>
> * minimum table size
> * first three resize steps
I've done dozens of experiements with changing these parameters
and changing the resize ratio (from 2/3 to 4/5, 3/5, 1/2, 3/7, and 4/7)
but found that what helped some applications would hurt others.
The current tuning remains fairly effective. Changing the resize
step from *2 to *4 was the only alteration that yielded across
the board improvements.
Raymond Hettinger