[Python-Dev] I think my set module is ready for prime time; comments?

Guido van Rossum guido@digicool.com
Tue, 23 Jan 2001 10:49:23 -0500


> I've just read the PEP.  Greg's proposal has a couple of problems.
> The biggest one is that the interface design isn't very Pythonic --
> it's formally adequate, but doesn't exploit the extent to which sets
> naturally have common semantics with existing Python sequence types.
> This is bad; it means that a lot of code that could otherwise ignore
> the difference between lists and sets would have to be specialized 
> one way or the other for no good reason.

Actually, I thought that Greg's proposal has some charm: it seems to
be using a natural extension of the existing dictionary syntax, where
a set is a dictionary without the values.  I haven't thought about
this deeply enough, but I see a lot of potential here.

I understand that you have probably given this more thought than I
have recently, so I'd like to see your more detailed analysis of what
you do and don't like about Greg's proposal!

> The only other set module I can find in the Vaults or anywhere else is
> kjBuckets (which I knew about before).  Looks like a good design, but
> complicated -- and requires installation of an extension.
> 
> > If *your* set module is ready for prime time, why not publish it in
> > the Vaults of Parnassus?
> 
> I suppose that's what I'll do if you don't bless it for the standard
> library.  But here are the reasons I suggest you should do so:
> 
> 1. It supports a set of operations that are both often useful and
> fiddly to get right, thus enhancing the "batteries are included"
> effect.  (I used its ancestor for representing seen-message numbers in
> a specialized mailreader, for example.)

I haven't read your docs yet (and no time because Digital Creations is
requiring my attention all of today), but I expect that designing a
universal set type, one that is good enough to be used in all sorts of
applications, is very difficult.  

> 2. It's simple for application programmers to use.  No extension module
> to integrate.

This is a silly argument for wanting something to be added to the
core.  If it's part of the core, the need for an extension is
immaterial because that extension will always be available.  So
I conclude that your module is set up perfectly for a popular module
in the Vaults. :-)

> 3. It's unsurprising.  My set objects behave almost exactly like other
> mutable sequences, with all the same built-in methods working, except for 
> the fact that you can't introduce duplicates with the mutators.

Ah, so you see a set as an extension of a sequence.  That may be the
big rift between your version and Greg's PEP: are sets more like
sequences or more like dictionaries?

> 4. It's already completely documented in a form suitable for the library.

Much appreciated.

> 5. It's simple enough not to cause you maintainance hassles down the
> road, and even if it did the maintainer is unlikely to disappear :-).

I'll be the judge of that, and since you prefer not to show your
source code (why is that?), I can't tell yet.

[...time flows...]

Having just skimmed your docs, I'm disappointed that you choose lists
as your fundamental representation type -- this makes it slow to test
for membership and hence makes intersection and union slow.  I suppose
that you have evidence from using this that those operations aren't
used much, or not for large sets?  This is one of the problems with
coming up with a set type for the core: it has to work for (nearly)
everybody.  It's no big deal if the Vaults contain three or more set
modules -- perfect even, people can choose the best one for their
purpose.  But in the core, there's only room for one set type or
module.

--Guido van Rossum (home page: http://www.python.org/~guido/)