[Python-Dev] I think my set module is ready for prime time; comments?
Thu, 25 Jan 2001 02:57:18 -0500
[Eric S. Raymond]
> What you get by going with a dictionary representation is that
> membership test becomes close to constant-time, while insertion and
> deletion become sometimes cheap and sometimes quite expensive
> (depending of course on whether you have to allocate a new
> hash bucket).
Note that Python's dicts aren't vulnerable to that: they use open
addressing in a contiguous, preallocated vector. There are no mallocs() or
free()s going on for lookups, deletes, or inserts, unless an insert happens
to hit a "time to double the size of the vector" boundary. Deletes never
cost more than a lookup; inserts never more unless the table-size boundary
is hit (one in 2**N unique inserts, at which point N goes up too).
> "works for everbody" isn't really possible here. So my solution
> does the next best thing -- pick a choice of tradeoffs that isn't
> obviously worse than the alternatives and keeps things bog-simple.
I agree that this shouldn't be an either/or choice, but if it's going to be
forced into that mold I have to protest that the performance of unordered
lists would kill most of the set applications I've ever had. I typically
have a small number of very large sets (and I'm talking not 100s, but often
100s of 1000s of elements). The relatively large memory burden of a dict
representation wouldn't bother me unless I instead had 100s of 1000s of very
which-we-may-happen-in-my-next-life-but-not-in-this-one-ly y'rs - tim