[Python-Dev] When do sets shrink?

Sun Jan 1 02:21:23 CET 2006

[Noam Raphael]
>>> For example, iteration over a set which once had
>>> 1,000,000 members and now has 2 can take 1,000,000 operations every
>>> time you traverse all the (2) elements.

[Raymond Hettinger]
>> Do you find that to be a common or plausible use case?

[Naom]
> I don't have a concrete example in this minute, but a set which is
> repeatedly filled with elements and then emptied by pop operations
> doesn't seem to me that far-fetched.

Ah, but that's an entirely different example than the one you started
with:  every detail counts when you're looking for "bad" cases.  In
this new example, the set _does_ get resized, as soon as you start
adding elements again.  OTOH, in the absence of repeated iteration
too, it's not clear that this resizing helps more than it hurts.

...

> I wanted to say that there isn't any new information, and yet I don't
> think that I have to assume that everything in current Python is the
> best that can be.

It was in 2005; 2006 is an entirely different year ;-)

> All I did was finding another reason why a downsizing scheme might
> be good, and posting it to ask if people have thought about it.

Not all that much -- sets whose sizes bounce around a lot, and which
are also iterated over a lot, haven't stuck out as an important use
case.  Typically, if a set or dict gets iterated over at all, that
happens once near the end of its life.

> If you have a document listing all the design decisions that went into
> dict implementation, then please send it to me and I won't ask about
> things that were already thought about.

Lots of info in the source; Josiah already pointed at the most useful dict docs.

> But the answer is, yes. I beleive that the current dict resizing
> scheme was born before the iterator protocol was introduced, and it
> may be a reason why the current scheme doesn't try to minimize the
> number of empty hash table entries.

Dict resizing was designed before the Python-level iteration protocol,
but under the covers dicts offered the PyDict_Next() C-level iteration
protocol "forever".  It's not the iteration protocol (or lack thereof)
that drives this.

Far more important is that dicts have always been heavily used by
Python itself, in its own implementation, for a variety of namespaces:
 the global dict, originally the local dict too, for class dicts and
instance dicts, and to pass keyword arguments.  Note that all those
cases use strings for keys, and in fact Python originally supported
only string-keyed dicts.  In all those cases too, deletions are at
worst very rare, and iteration a minor use case.