[Python-Dev] PEP 218 (sets); moving set.py to Lib

Guido van Rossum guido@python.org
Tue, 20 Aug 2002 22:05:03 -0400


> 1. Rename .remove() to __del__().  Its usage is inconsistent with
> list.remove(element) which can leave other instances of element in
> the list.  It is more consistent with 'del adict[element]'.

(You mean __delete__.)

-1.  Sets don't support the x[y] notation for accessing or setting
elements, so it would be weird to use that for deleting.  You're not
deleting the value corresponding to the key or index y (like you are
when using del x[y] on a list or dict), you're deleting y itself.
That's more like x.remove(y) for lists.

> 2.  discard() looks like a useful standard API.  Perhaps it shoulds
> be added to the dictionary API. 

Perhaps.  But the dict API already has so many things.  And why not to
the list API?  I'm -0 on this.

> 3.  Should we add .as_temporarily_immutable  to dictionaries and
> lists so that they will also be potential elements of a set?

Hm, I think this is premature.  I'd like to see a use case for a set
of dicts and a set of lists first.

Then you can code list and dict subclasses in Python that implement
_as_temporarily_immutable (and _as_immutable too, I suppose).

Then we'll see how often this ends up being used.

For now, I'd say YAGNI.  (I've said YAGNI to sets for years, but the
realization that as a result lots of people independently invented
using the keys of a dict to represent a set made me change my mind.
Sort of like how I changed my mind on a Boolean type, too, after 12
years of thinking it wasn't needed. :-)

> 4. remove(), update(), add(), and __contains__() all work hard to
> check for .as_temporarily_immutable().  Should this propagated to
> other methods that add set members(i.e. replace all instances of
> data[element] = value with self.add(element) or use self.update() in
> the code for __init__())?

I've been thinking the same thing.  I think that the only case where
this could apply is __init__(), by the way.

> The answer is tough because it causes an enormous slowdown in the
> common use cases of uniquifying a sequence.  OTOH, why check in some
> places but not others -- why is .add(aSetInstance) okay but not
> Set([aSetInstance]).

Really?  Why the slowdown?

I was thinking of simply changing __init__ into

    if seq is not None:
        self.update(seq)

If that's too slow, perhaps update() could be changed to the
following:

    it = iter(seq)
    try:
        for elt in it:
            data[elt] = value
    except TypeError:
        transform = getattr(elt, '_as_immutable', None)
        if transform is None:
            raise
        data[transform()] = value
        self.update(it)

That is, if there are no elements that require transformation, the
added cost is a single try/except setup (plus an extra call to
iter()).  If any element requires transformation, the rest of the
elements are dealt with as fast as update() can.

Hm, maybe this could be applied to update() too (except it shouldn't
call itself recursively but simply write the loop out a second time,
with a try/except around each element).

> If the answer is yes, then the code for update() should be
> super-optimized by taking moving the try/except outside the for-loop
> and wrapping the whole thing in a while 1.

That's a similar idea as I just sketched.  Can you email me a proposed
patch?  (Let's skip SF for this.)

> Also, we could bypass the slower .add() method when incoming source
> of elements is known to be an instance of BaseSet.

Huh?  Nobody calls add() internally AFAIK.

> 5. Add a quick pre-check to issubset() and issuperset() along the
> lines of:
> 
>         def issubset(self, other):
>             """Report whether another set contains this set."""
>             self._binary_sanity_check(other)
>             if len(self) > len(other): return False   # Fast check for the obvious case
>             for elt in self:
>                 if elt not in other:
>                     return False
>             return True

Sure.  Check it in.

> 6.  For clarity and foolish consistency, replace all occurrences of
> 'elt' with 'element'.

Hm, no.  'element' for a loop control variable seems too long (I'd be
happy with 'x' but Greg Wilson used 'element').  However I like
'element' as the argument name because it can be used as a keyword
argument and then it's better spelled out in full.  I think Greg
Wilson used 'item' most of the time; I prefer to be consistent and say
'element' all the time since that's the accepted set terminology.

--Guido van Rossum (home page: http://www.python.org/~guido/)