[Python-Dev] When do sets shrink?

Guido van Rossum guido at python.org
Thu Dec 29 16:59:48 CET 2005


On 12/29/05, Noam Raphael <noamraph at gmail.com> wrote:
> On 12/29/05, Donovan Baarda <abo at minkirri.apana.org.au> wrote:
> > Without some sort of fancy overkill size hinting or history tracking,
> > that's probably as good a heuristic as you can get.
>
> I'm sorry, but it's not correct. There's a simple resize scheduling
> algorithm that is proven to take, when you sum things up, O(1) per
> each simple operation, and that keeps the amount of used memory always
> proportional to the number of elements in the set.
>
> I'm not saying that practically it must be used - I'm just saying that
> it can't be called a heuristic, and that it doesn't involve any "fancy
> overkill size hinting or history tracking". It actually means
> something like this:
> 1. If you want to insert and the table is full, resize the table to
> twice the current size.
> 2. If you delete and the number of elements turns out to be less than
> a quarter of the size of the table, resize the table to half of the
> current size.

I'm neutral on doing this.

I'd like to point out that if we decide not to do this, there's an
easy way to shrink dicts and sets under user control, which means that
you only have to pay attention in the rare cases where it matters:
create a new dct or set from the old one automatically creates one of
the right size. E.g.

s = set(s)

replaces the set s (which may once have had 1M items and now has
nearly 1M empty slots) by one with the "optimal" number of slots.
Ditto for dicts:

d = dict(d)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list