py2.7: dictobject not properly resizing

I was taking a look at dictobject.c and realized that the logic controlling whether a resizedict will occur in dict_set_item_by_hash_or_entry disallows for the shrinking of a dictionary. This is contrary to what the comments directly above say: (http://hg.python.org/cpython/file/f3032825f637/Objects/dictobject.c#l771) 771 /* If we added a key, we can safely resize. Otherwise just return! 772 * If fill >= 2/3 size, adjust size. Normally, this doubles or 773 * quaduples the size, but it's also possible for the dict to shrink 774 * (if ma_fill is much larger than ma_used, meaning a lot of dict 775 * keys have been * deleted). The "bug" occures in the following conditional since we exit out of the function without checking the relative magnitudes of ma_filled to ma_used. Instead, we only check if we still have a correct loading factor (and the "don't resize on modification" bit). This can be fixed by changing the following conditional on line 785 to: if (mp->ma_used <= n_used || (mp->ma_fill*3 < (mp->ma_mask+1)*2 && mp->ma_used*5 > mp->ma_fill)) The factor of 5 was chosen arbitrarily... I'm sure with some benchmarking we could tune it to an optimal value for the internal use of dictionaries. However, before I put this effort in I was wondering if this is in fact desired behavior or if it is indeed a bug. At the very least, the comments should be updated to reflect the actual resizing dynamics of the dictionary. Micha ----------------------------- http://micha.gd/ http://github.com/mynameisfiber/

Hi Antoine, On Sat, Mar 30, 2013 at 10:37 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It doesn't disallow shrinking. If you take a dictionary of size 1000, remove of its elements, and continue to use it (write and delete more items) for long enough, then eventually it is shrinked. It just takes a while because it needs to fill 2/3 of the slots of the big table with "deleted" markers before it happens. Python 3.3.0b1 (default:07ddf5ecaafa, Aug 12 2012, 17:47:28) [GCC 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10)] on linux Type "help", "copyright", "credits" or "license" for more information.
A bientôt, Armin.

Hi Antoine, On Sat, Mar 30, 2013 at 10:37 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It doesn't disallow shrinking. If you take a dictionary of size 1000, remove of its elements, and continue to use it (write and delete more items) for long enough, then eventually it is shrinked. It just takes a while because it needs to fill 2/3 of the slots of the big table with "deleted" markers before it happens. Python 3.3.0b1 (default:07ddf5ecaafa, Aug 12 2012, 17:47:28) [GCC 3.4.6 (Gentoo 3.4.6-r2, ssp-3.4.6-1.0, pie-8.7.10)] on linux Type "help", "copyright", "credits" or "license" for more information.
A bientôt, Armin.
participants (3)
-
Antoine Pitrou
-
Armin Rigo
-
Micha Gorelick