[Python-Dev] I think my set module is ready for prime time; comments?

Thu, 25 Jan 2001 02:57:18 -0500

[Eric S. Raymond]
> ...
> What you get by going with a dictionary representation is that
> membership test becomes close to constant-time, while insertion and
> deletion become sometimes cheap and sometimes quite expensive
> (depending of course on whether you have to allocate a new
> hash bucket).

Note that Python's dicts aren't vulnerable to that:  they use open
addressing in a contiguous, preallocated vector.  There are no mallocs() or
free()s going on for lookups, deletes, or inserts, unless an insert happens
to hit a "time to double the size of the vector" boundary.  Deletes never
cost more than a lookup; inserts never more unless the table-size boundary
is hit (one in 2**N unique inserts, at which point N goes up too).

> ...
> "works for everbody" isn't really possible here.  So my solution
> does the next best thing -- pick a choice of tradeoffs that isn't
> obviously worse than the alternatives and keeps things bog-simple.

I agree that this shouldn't be an either/or choice, but if it's going to be
forced into that mold I have to protest that the performance of unordered
lists would kill most of the set applications I've ever had.  I typically
have a small number of very large sets (and I'm talking not 100s, but often
100s of 1000s of elements).  The relatively large memory burden of a dict
representation wouldn't bother me unless I instead had 100s of 1000s of very
small sets.

which-we-may-happen-in-my-next-life-but-not-in-this-one-ly y'rs  - tim