[Python-Dev] Using lists as sets

Ka-Ping Yee ping@lfw.org
Fri, 17 Mar 2000 10:08:37 -0600 (CST)


A different way to provide sets in Python, which occurred to
me on Wednesday at Guido's talk in Mountain View (hi Guido!),
is to just make lists work better.

Someone asked Guido a question about the ugliness of using
dicts in a certain way, and it was clear that what he wanted
was a real set.  Guido's objection to introducing more core
data types is that it makes it more difficult to choose which
data type to use, and opens the possibility of using entirely
the wrong one -- a very well-taken point, i thought.

(That recently-mentioned study of scripting vs. system language
performance seems relevant here: a few of the C programs
submitted were much *slower* than the ones in Python or Perl
just because people had to choose and implement their own data
structures, and so they were able to completely shoot themselves
in both feet and lose a leg or two in the process.)

So...

Hypothesis: The only real reason people might want a separate
set type, or have to use dicts as sets, is that linear search
on a list is too slow.

Therefore: All we have to do is speed up "in" on lists, and now
we have a set type that is nice to read and write, and already
has nice spellings for set semantics like "in".

Implementation possibilities:

    + Whip up a hash table behind the scenes if "in" gets used
      a lot on a particular list and all its members are hashable.
      This makes "in" no longer O(n), which is most of the battle.
      remove() can also be cheap -- though you have to do a little
      more bookkeeping to take care of multiple copies of elements.

    + Or, add a couple of methods, e.g. take() appends an item to
      a list if it's not there already, drop() removes all copies
      of an item from a list.  These tip us off: the first time one
      of these methods gets used, we make the hash table then.

I think the semantics would be pretty understandable and simple to
explain, which is the main thing.

Any thoughts?


-- ?!ng