[Python-Dev] Dictionary sparseness

Raymond Hettinger Raymond Hettinger" <python@rcn.com
Sun, 4 May 2003 18:59:46 -0400


After more dictionary sparseness experiments, I've become 
convinced that the ideal settings are better left up to the user 
who is in a better position to know:

* anticipated dictionary size 
* overall application memory issues
* characteristic access patterns (stores vs. reads vs. deletions vs. iteration)
* when the dictionary is growing, shrinking, or stablized.
* whether many deletions have taken place

I have two competing proposals to expose dictresize():

1)  d.resize(minsize=0)

The first approach allows a user to trigger a resize().  This is handy
after deletions have taken place and dictionary contents have become
stable.  It allows the dictionary to be rebuilt without dummy entries.

If the minsize factor is specified, then the dictionary will be built
to the specified size or larger if needed to achieve a power of two
or to accommodate existing entries.

That is handy when building a dictionary whose approximate size 
is known in advance because it eliminates all of the intermediate
resizes during construction.  For instance,  the builtin dictionary can
be pre-sized for the 126 entries and it will build more quickly.

It is also useful after dictionary contents have stabilized and the user
wants improved lookup time at the expense of additional memory
and slower iteration time.  For instance, the builtin dictionary can 
be resized to 500 entries making it so sparse that the lookups will 
typically hit on the first try.

This API requires a little user sophistication because the effects
get wiped out during the next automatic resize (when the dict is
two-thirds full). 


2)  d.setsparsity(factor=1)

The second approach does not allow dictionaries to be pre-sized,
but the effects do not get wiped out by normal dictionary activity.

It is handy when a particular dictionary's lookup/insertion time 
is more important than iteration time or space considerations.  
For instance, the builtin dictionary can be set to a sparsity factor 
of four so that lookups are more rapid.


Raymond Hettinger