Dictionary in Python
tim_one at email.msn.com
Mon Mar 13 21:02:23 EST 2000
> I'm going to have a look at how python decides to resize
> the hash table.
A hash table is a contiguous vector of records, whose slots come in three
1. Slots with key+value pairs. Call these citizens.
2. Not-yet-used slots. Call these virgins.
3. Slots that were once citizens, but whose key got deleted, and where
another key+value pair hasn't yet overwritten the slot. Call these
turds (that's the technical term <0.9 wink>).
Python resizes the table when the number of virgins falls below a third of
the total number of slots. In the usual case, Python doubles the table size
(up to a current maximum of 1,073,741,824 slots). However, if many
deletions leave behind many turds, it's possible for the number of virgins
to get very low despite that few citizens remain; in that case, Python
actually shrinks the table (down to a current minimum of 4 slots).
To avoid thrashing when a mix of additions and deletions are made when the
table is near a resize threshold, Python doesn't actually check the # of
virgins after a delete (in effect, it assumes you'll soon be replacing the
turds with citizens again). So, curiously enough, deleting a key *never*
shrinks the table. A long sequence of deletes followed by an add may shrink
it, though. A way to force possible shrinkage without adding a key is:
dict = dict.copy()
dict.copy() always returns a turd-free dictionary, of the smallest
power-of-2 size that leaves at least a third of the slots virgin.
dryly y'rs - tim
More information about the Python-list