[Python-Dev] shrinking dicts

Tim Peters tim_one@email.msn.com
Wed, 11 Aug 1999 01:30:20 -0400


[Vladimir]
> Currently, dictionaries always grow until they are deallocated from
> memory.

It's more accurate to say they never shrink <0.9 wink>.  Even that has
exceptions, though, starting with:

> This happens in PyDict_SetItem according to the following
> code (before inserting the new item into the dict):
>
>         /* if fill >= 2/3 size, double in size */
>         if (mp->ma_fill*3 >= mp->ma_size*2) {
>                 if (dictresize(mp, mp->ma_used*2) != 0) {
>                         if (mp->ma_fill+1 > mp->ma_size)
>                                 return -1;
>                 }
>         }

This code can shrink the dict too.  The load factor computation is based on
"fill", but the resize is based on "used".  If you grow a huge dict, then
delete all the entries one by one, "used" falls to 0 but "fill" stays at its
high-water mark.  At least 1/3rd of the entries are NULL, so "fill"
continues to climb as keys are added again:  when the load factor
computation triggers again, "used" may be as small as 1, and dictresize can
shrink the dict dramatically.

The only clear a priori return I see in your patch is that I might save
memory if I delete gobs of stuff from a dict and then neither get rid of it
nor add keys to it again.  But my programs generally grow dicts forever,
grow then delete them entirely, or cycle through fat and lean times (in
which case the code above already shrinks them from time to time).  So I
don't expect that your patch would be buy me anything I want, but would cost
me more on every delete.

> ...
> Any other ideas on how to deal with this? Thoughts, comments?

Just that slowing the expected case to prevent theoretical bad cases is
usually a net loss -- I think the onus is on you to demonstrate that this
change is an exception to that rule.  I do recall one real-life complaint
about it on c.l.py a couple years ago:  the poster had a huge dict,
eventually deleted most of the items, and then kept it around purely for
lookups.  They were happy enough to copy the dict into a fresh one a
key+value pair at a time; today they could just do

    d = d.copy()

or even

    d.update({})

to shrink the dict.

It would certainly be good to document these tricks!

if-it-wasn't-a-problem-the-first-8-years-of-python's-life-it's-hard-to-
    see-why-1999-is-special-ly y'rs  - tim