[Python-Dev] Re: PEP 218 (sets); moving set.py to Lib
Guido van Rossum
guido@python.org
Thu, 29 Aug 2002 11:26:51 -0400
(How about limiting our lines to 72 characters?)
> > > Eliminate the binary sanity checks which verify for operators
> > > that 'other' is a BaseSet. If 'other' isn't a BaseSet, try using
> > > it, directly or by coercing to a set, as an iterable:
> > >
> > > >>> Set('abracadabra') | 'alacazam'
> > > Set(['a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'])
> > >
> > > This improves usability because the second argument did not have
> > > to be pre-wrapped with Set. It improves speed, for some
> > > operations, by using the iterable directly and not having to
> > > build an equivalent dictionary.
> >
> > No. This has been proposed before. I think it's a bad idea, just as
> >
> > [1,2,3] + "abc"
> >
> > is a bad idea.
>
> I see the wisdom in preventing weirdness. The real motivation was
> to get sets.py to play nicely with other set implementations. Right
> now, it can only interact with instances of BaseClass. And, even if
> someone subclasses BaseClass, they currently *must* have a
> self._data attribute that is a dictionary. This prevents
> non-dictionary based extensions.
I've thought of (and I think I even posted) a different way to
accomplish the latter, *if* *and* *when* it becomes necessary. Here
it is again:
- BaseSet becomes a true abstract class. I don't care if it has dummy
methods that raise NotImplementedError, but the set of operations it
stands for should be documented. I propose that it should stand for
only the published operations of ImmutableSet. Other set
implementations can then derive from BaseSet.
- The implementation currently in BaseSet is moved to a new internal
class, e.g. _CoreSet, which derives from BaseSet.
- Set and ImmutableSet derive from _CoreSet.
- The binary operators (and sundry other places as needed) make *two*
checks for the 'other' argument:
- If it is a _CoreSet instance, do what's currently done, taking a
shortcut knowing the implementation.
- Otherwise, if it is a BaseSet instance, implement the operation
using only the published set API.
Example:
def __or__(self, other):
if isinstance(other, _CoreSet):
result = self.__class__(self._data)
result._data.update(other._data)
return result
if isinstance(other, BaseSet):
result = self.__class__(self._data)
result.update(other)
return NotImplemented
This effectively makes BaseSet a protocol. I realize that there is
some resistance to using inheritance from a designated abstract base
class as a way to indicate that a class implements a given protocol;
but since we don't have other solutions in place, I think this is a
reasonable solution. Trying to sniff whether the other argument
implements a set protocol by testing the presence of specific APIs
seems awkward, especially since most set APIs (__or__ etc.) are
heavily overloaded by types that aren't sets at all.
> > > Have ImmutableSet keep a reference to the original iterable.
> > > Add an ImmutableSet.refresh() method that rebuilds ._data from
> > > the iterable. Add a Set.refresh() method that triggers
> > > ImmutableSet.refresh() where possible. The goal is to improve
> > > the usability of sets of sets where the inner sets have been
> > > updated after the outer set was created.
> > >
> > > >>> inner = Set('abracadabra')
> > > >>> outer = Set([inner])
> > > >>> inner.add('z') # now the outer set is out-of-date
> > > >>> outer.refresh() # now it is current
> > > >>> outer
> > > Set([ImmutableSet(['a', 'c', 'r', 'z', 'b', 'd'])])
> > >
> > > This would only work for restartable iterables -- a file object would not be so easily refreshed.
> >
> > This *appears* to be messing with the immutability. If I wrote:
> >
> > a = range(3)
> > s1 = ImmutableSet(a)
> > s2 = Set([s1])
> > a.append(4)
> > s2.refresh()
> >
> > What would the value of s1 be?
>
> Hmm, I intended to have s1.refresh() return a new object for use in
> s2 while leaving s1 alone (being immutable and all). Now, I wonder
> if that was the right thing to do. The answer lies in use cases for
> algorithms that need sets of sets. If anyone knows off the top of
> their head that would be great; otherwise, I seem to remember that
> some of that business was found in compiler algorithms and graph
> packages.
Let's call YAGNI on this one.
--Guido van Rossum (home page: http://www.python.org/~guido/)