[Python-Dev] shrinking dicts
Vladimir Marangozov
Vladimir.Marangozov@inrialpes.fr
Tue, 10 Aug 1999 13:04:27 +0100 (NFT)
Currently, dictionaries always grow until they are deallocated from
memory. This happens in PyDict_SetItem according to the following
code (before inserting the new item into the dict):
/* if fill >= 2/3 size, double in size */
if (mp->ma_fill*3 >= mp->ma_size*2) {
if (dictresize(mp, mp->ma_used*2) != 0) {
if (mp->ma_fill+1 > mp->ma_size)
return -1;
}
}
The symmetric case is missing and this has intrigued me for a long time,
but I've never had the courage to look deeply into this portion of code
and try to propose a solution. Which is: reduce the size of the dict by
half when the nb of used items <= 1/6 the size.
This situation occurs far less frequently than dict growing, but anyways,
it seems useful for the degenerate cases where a dict has a peek usage,
then most of the items are deleted. This is usually the case for global
dicts holding dynamic object collections, etc.
A bonus effect of shrinking big dicts with deleted items is that
the lookup speed may be improved, because of the cleaned <dummy> entries
and the reduced overall size (resulting in a better hit ratio).
The (only) solution I could came with for this pb is the appended patch.
It is not immediately obvious, but in practice, it seems to work fine.
(inserting a print statement after the condition, showing the dict size
and current usage helps in monitoring what's going on).
Any other ideas on how to deal with this? Thoughts, comments?
--
Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
-------------------------------[ cut here ]---------------------------
*** dictobject.c-1.5.2 Fri Aug 6 18:51:02 1999
--- dictobject.c Tue Aug 10 12:21:15 1999
***************
*** 417,423 ****
ep->me_value = NULL;
mp->ma_used--;
Py_DECREF(old_value);
! Py_DECREF(old_key);
return 0;
}
--- 417,430 ----
ep->me_value = NULL;
mp->ma_used--;
Py_DECREF(old_value);
! Py_DECREF(old_key);
! /* For bigger dictionaries, if used <= 1/6 size, half the size */
! if (mp->ma_size > MINSIZE*4 && mp->ma_used*6 <= mp->ma_size) {
! if (dictresize(mp, mp->ma_used*2) != 0) {
! if (mp->ma_fill > mp->ma_size)
! return -1;
! }
! }
return 0;
}