[Python-Dev] Questions about sets.py

Fri, 23 Aug 2002 16:54:46 -0400

1. BaseSet contains 3 blobs like this:

    def __or__(self, other):
        """Return the union of two sets as a new set.

        (I.e. all elements that are in either set.)
        """
        if not isinstance(other, BaseSet):
            return NotImplemented
        result = self.__class__(self._data)
        result._data.update(other._data)
        return result

    def union(self, other):
        """Return the union of two sets as a new set.

        (I.e. all elements that are in either set.)
        """
        return self | other

Is there a good reason not to write the latter as just

    union = __or__

?

2. Is there a particular reason for not coding issuperset as

    def issuperset(self, other):
        """Report whether this set contains another set."""
        self._binary_sanity_check(other)
        return other.issubset(self)

?  Given that issubset exists, issuperset is of marginal value anyway.

3. BaseSet._update is a darned cute example of exploiting that the iterator
returned by iter() isn't restartable!.  That isn't a question, it's just a
giggle <wink>.

4. Who believes that the __le__, __lt__, __ge__, and __gt__ methods are a
good idea?  If anything, I'd expect s <= t to mean "is subset", and "s < t"
to mean "is proper subset".  Getting the lexicographic ordering of the
underlying dicts instead doesn't seem to be of any use, except perhaps to
prevent sorting lists of sets from blowing up.  Fine by me if that blows up,
though.

5. It's curious enough that we avoid dict.copy() in

    def copy(self):
        """Return a shallow copy of a set."""
        result = self.__class__([])
        result._data.update(self._data)
        return result

that if there's a reason to avoid it a comment would help.

6. It seems that doing

    self.__class__([])

in various places instead of

    self.__class__()

wastes time without reason (it builds a unique empty list each time, the
__init__ function then does a useless iteration dance over that, and finally
the list object is torn apart again).  If the intent is to communicate that
we're creating an empty set, IMO the latter spelling is even a bit clearer
about that (I see "[]" and keep wondering what it's trying to accomplish).

7. union_update, intersection_update, symmetric_difference_update, and
difference_update return self despite mutating in-place.  That makes them
unique among mutating container methods (e.g., list.append, list.insert,
list.remove, dict.update, list.sort, ..., return None).  Is the
inconsistency worth it?  Chaining mutating set operations isn't common, and
with names like "symmetric_difference_update()" it's a challenge to fit more
than one on a line anyway <wink>.

If it's thought that chaining mutating operations is somehow a good idea for
sets when it's not for other containers, then we really have to be
consistent about it in the sets module.  For example, then Set.add() should
return self too; indeed, set.add(elt1).add(elt2) may even be pleasant at
times.

Or if the point was merely to create "nice names" for __ior__ etc, then,
e.g., the existing union_update should be renamed to __ior__, and
union_update defined as

    def union_update(self, other):
        """yadda"""
        self |= other

and let it return None.  In a sense this is the opposite of question #1,
where the extra code block *is* supplied but without an apparent need.

8. If there's something still valuable in _test(), I think it ought to be
moved into test_sets.py.  "Self-testing modules" can be convenient when
developing, but after modules are deployed in the std library the embedded
tests are never run again (with the exception of module doctests, which can
easily be run via a regrtest-flavor test_xyz test, and which are so
invoked).