How to represent sets

Terry Reedy tjreedy at udel.edu
Sat Sep 14 11:50:21 EDT 2002


"Anton Vredegoor" <anton at vredegoor.doge.nl> wrote in message
news:alvd8p$1sq$1 at news.hccnet.nl...
> In the past there have been some discussions on various platforms
about
> whether sets should be implemented using long integers (possibly
leading to
> longs becoming iterable arrays of booleans), or by using
dictionaries.

Framing this an an either/or question is simply wrong.  The 'best'
implementation of the abstract concept 'set' depends on the
application, the number of possible members, the density of actual
sets, and the operations needed.  Sets with up to 32 possible members
are well implemented by C longs (Python ints).  Somewhat dense sets of
ints in a much larger range can be longer bit maps.  A set of integers
in the range +- 10**20 has to be done with dicts.  Ditto with sets of
strings up to length, say, 32.

Similar false questions for other abstract data types: what is the
best way to implement numbers, sequences, or matrixes.  For matrixes,
linear arrays (with address calculation), arrays of arrays, and dicts
mapping pair->value all have their appropriate uses.

>  The matter seems to be mostly settled and there is a sets.py script
in the CVS
> directory at sourceforge which uses dictionaries.

Guido has agreed to support an experimental general solution as part
of the standard library, perhaps to eventually become a builtin module
or type.  This in no way precludes other implementations.  That is
part of way user classes with magic methods is for.

Choosing to support a general purpose set-of-anything-hashable rather
than a more limited set-of-integers-in-a-reasonable-range is in line
with Guido's other design decisions.

> I would like to reopen this discussion.

I don't remember any pronouncement from Guido or anyone else closing
it, except perhaps with regard to what he will personally support.

> Starting this discussion again when the CVS script is already very
mature
> seems a bit inappropriate, but I believe there is a way to make sets
faster. I
> only recently realized this so I could not act earlier.

Special-purpose implementations often have some advantage of the most
general implementation.  If not, they get forgotten about.
>
> To give some material for the discussion I am providing a beta
version of a
> possible implementation af a set class using long integers to
represent sets.
> It is basically the same script I presented a year ago but I have
made several
> improvements to this script so that it is now a bit faster, and I
hope the
> docstring is a bit better now. Please keep in mind this is just a
test script,
> for a safer and more complete implementation the script in CVS is
preferable.
>
> http://home.hccnet.nl/a.vredegoor/universe/universe.py
>
> or the html version:
>
> http://home.hccnet.nl/a.vredegoor/universe/universe.html

To be most useful as an alternative implementation, your
implementation should be (could be? is?) a plug-in replacement for
sets.py so one could gain its advantages by replacing 'import sets'
with 'import universe as sets' (or 'import longsets as sets').

Having one 'standard' implementation now defines a standard interface
that gives you a (pretty-well) fixed target to aim at for
compatibility.  Before this was hashed out among the prime developers,
and some arbitrary decisions made, there were multiple possible
targets.

I took a preliminary look at universe.py.  Your test() function would
be more useful as a regression test if you automated it with the
doctest module.  To do so, cut the contents of the function (perhaps
after dedenting) and paste it into the interactive interpreter.  (On
Windows, turn off 'fast pasting' first!!, and maybe paste just a few
lines at a time.)  Then cut and paste the result and put it in a doc
string and otherwise add the doctest boiler plate given in
doctest.__doc__.

Terry J. Reedy






More information about the Python-list mailing list