[Python-Dev] I think my set module is ready for prime time; comments?

Greg Wilson gvwilson@nevex.com
Tue, 23 Jan 2001 15:26:22 -0500


Greg Wilson:
Meta-question: do people want to continue to discuss sets on the
general python-dev list, or take it out-of-line (e.g. to an egroups
list)?  I'm finding all of the discussion very useful, but I realize
that many readers might prefer to concentrate on the 2.1 release...

> Jeremy Hylton <jeremy@alum.mit.edu>:
> > The tests showed that dictionary-based sets were always faster.
> > small tests (3 operations), the difference was about 10 percent.
> > larger tests (88 operations), the difference ranged from 
> > 180 to almost 700 percent.

> Eric Raymond <esr@thyrsus.com>:
> Not surprising.  88 elements is getting pretty large.

Greg Wilson:
Really?  I was testing my implementation with sets of email addresses
grep'd out of old mail folders --- typical sizes were several thousand
elements.

> From: Christopher Petrilli <petrilli@amber.org>
> Unfortunately, for me, a Python implementation of Sets is only
> interesting academicaly.  Any time I've needed to work with them at a
> large scale, I've needed them *much* faster than Python could achieve
> without a C extension.

Greg Wilson:
I had been expecting to implement this in C, not in pure Python, for
performance.

> From: Christopher Petrilli <petrilli@amber.org>
> In the "scripting" problem domain, I would agree that Sets would
> rarely reach large sizes,
> and so a algorithm which performed in quadratic time might be fine,

Greg Wilson:
I strongly disagree (see the email address example above --- it was
the first thing that occurred to me to try).  I am still hoping to
find a sub-quadratic (preferably sub-linear) implementation.  I can
do it in C++ with observer/observable (contained items notify containers
of changes in value, sets store all equivalent items in the same bucket),
but that doesn't really help...

> From: Ka-Ping Yee <ping@lfw.org>
> The only change that needs to be made to support sets of immutable
> elements is to provide "in" on dictionaries...

and:

> From: Neil Schemenauer <nas@arctrix.com>
> ...if sets are added to the core...we would
> use the implementation of PyDict but drop the values.

Unfortunately, if values are required to be immutable, then sets of
sets aren't possible... :-(

Thanks, everyone,
Greg