[Python-Dev] PEP 218 (sets); moving set.py to Lib

Raymond Hettinger python@rcn.com
Tue, 20 Aug 2002 19:23:30 -0400


[GvR]
> > > I am still perplexed that I receoved *no* feedback on the sets module
> > > except on this issue of sort order (which I consider solved by adding
> > > a method _repr() that takes an optional 'sorted' argument).

[RH]
> > P.S.  More comments are on the way as we play with, profile, review,
> > optimize, and document the module ;)

[GvR]
> Didn't you submit a SF patch/bug?  I think I replied to that.

Yes.  I've now revised the patch accordingly.

More thoughts:

1. Rename .remove() to __del__().  Its usage is inconsistent with list.remove(element) which can leave other instances of element
in the list.  It is more consistent with 'del adict[element]'.

2.  discard() looks like a useful standard API.  Perhaps it shoulds be added to the dictionary API.

3.  Should we add .as_temporarily_immutable  to dictionaries and lists so that they will also be potential elements of a set?

4. remove(), update(), add(), and __contains__() all work hard to check for .as_temporarily_immutable().  Should this propagated
to other methods that add set members(i.e. replace all instances of data[element] = value with self.add(element) or use
self.update() in the code for __init__())?

The answer is tough because it causes an enormous slowdown in the common use cases of uniquifying a sequence.  OTOH, why check in
some places but not others -- why is .add(aSetInstance) okay but not Set([aSetInstance]).

If the answer is yes, then the code for update() should be super-optimized by taking moving the try/except outside the for-loop
and wrapping the whole thing in a while 1.  Also, we could bypass the slower .add() method when incoming source of elements is
known to be an instance of BaseSet.

5. Add a quick pre-check to issubset() and issuperset() along the lines of:

        def issubset(self, other):
            """Report whether another set contains this set."""
            self._binary_sanity_check(other)
            if len(self) > len(other): return False   # Fast check for the obvious case
            for elt in self:
                if elt not in other:
                    return False
            return True

6.  For clarity and foolish consistency, replace all occurrences of 'elt' with 'element'.


Raymond Hettinger