[Python-Dev] Re: PEP 218 (sets); moving set.py to Lib

Guido van Rossum guido@python.org
Thu, 29 Aug 2002 11:26:51 -0400


(How about limiting our lines to 72 characters?)

> > > Eliminate the binary sanity checks which verify for operators
> > > that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using
> > > it, directly or by coercing to a set, as an iterable:
> > >
> > > >>> Set('abracadabra') | 'alacazam'
> > > Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
> > >
> > > This improves usability because the second argument did not have
> > > to be pre-wrapped with Set.  It improves speed, for some
> > > operations, by using the iterable directly and not having to
> > > build an equivalent dictionary.
> >
> > No.  This has been proposed before.  I think it's a bad idea, just as
> >
> >    [1,2,3] + "abc"
> >
> > is a bad idea.
> 
> I see the wisdom in preventing weirdness.  The real motivation was
> to get sets.py to play nicely with other set implementations.  Right
> now, it can only interact with instances of BaseClass.  And, even if
> someone subclasses BaseClass, they currently *must* have a
> self._data attribute that is a dictionary.  This prevents
> non-dictionary based extensions.

I've thought of (and I think I even posted) a different way to
accomplish the latter, *if* *and* *when* it becomes necessary.  Here
it is again:

- BaseSet becomes a true abstract class.  I don't care if it has dummy
  methods that raise NotImplementedError, but the set of operations it
  stands for should be documented.  I propose that it should stand for
  only the published operations of ImmutableSet.  Other set
  implementations can then derive from BaseSet.

- The implementation currently in BaseSet is moved to a new internal
  class, e.g. _CoreSet, which derives from BaseSet.

- Set and ImmutableSet derive from _CoreSet.

- The binary operators (and sundry other places as needed) make *two*
  checks for the 'other' argument:

  - If it is a _CoreSet instance, do what's currently done, taking a
    shortcut knowing the implementation.

  - Otherwise, if it is a BaseSet instance, implement the operation
    using only the published set API.

Example:

  def __or__(self, other):
      if isinstance(other, _CoreSet):
          result = self.__class__(self._data)
          result._data.update(other._data)
          return result
      if isinstance(other, BaseSet):
          result = self.__class__(self._data)
          result.update(other)
      return NotImplemented

This effectively makes BaseSet a protocol.  I realize that there is
some resistance to using inheritance from a designated abstract base
class as a way to indicate that a class implements a given protocol;
but since we don't have other solutions in place, I think this is a
reasonable solution.  Trying to sniff whether the other argument
implements a set protocol by testing the presence of specific APIs
seems awkward, especially since most set APIs (__or__ etc.) are
heavily overloaded by types that aren't sets at all.

> > > Have ImmutableSet keep a reference to the original iterable.
> > > Add an ImmutableSet.refresh() method that rebuilds ._data from
> > > the iterable.  Add a Set.refresh() method that triggers
> > > ImmutableSet.refresh() where possible.  The goal is to improve
> > > the usability of sets of sets where the inner sets have been
> > > updated after the outer set was created.
> > >
> > > >>> inner = Set('abracadabra')
> > > >>> outer = Set([inner])
> > > >>> inner.add('z')                 # now the outer set is out-of-date
> > > >>> outer.refresh()               # now it is current
> > > >>> outer
> > > Set([ImmutableSet(['a', 'c', 'r', 'z', 'b', 'd'])])
> > >
> > > This would only work for restartable iterables -- a file object would not be so easily refreshed.
> >
> > This *appears* to be messing with the immutability.  If I wrote:
> >
> >   a = range(3)
> >   s1 = ImmutableSet(a)
> >   s2 = Set([s1])
> >   a.append(4)
> >   s2.refresh()
> >
> > What would the value of s1 be?
> 
> Hmm, I intended to have s1.refresh() return a new object for use in
> s2 while leaving s1 alone (being immutable and all).  Now, I wonder
> if that was the right thing to do.  The answer lies in use cases for
> algorithms that need sets of sets.  If anyone knows off the top of
> their head that would be great; otherwise, I seem to remember that
> some of that business was found in compiler algorithms and graph
> packages.

Let's call YAGNI on this one.

--Guido van Rossum (home page: http://www.python.org/~guido/)