[Python-ideas] universal set object for use in set manipulation

Steven D'Aprano steve at pearwood.info
Thu Jul 23 20:00:11 CEST 2009


On Fri, 24 Jul 2009 02:22:47 am Georg Brandl wrote:

> > (1) What elements should go into the universal set? Strings, ints,
> > floats, presidents... ?
>
> Everything. Every possible (i.e. hashable) object is a (virtual)
> member of the universal set.  You're thinking too mathematically
> here.  Python has no notion of a category of elements for ordinary
> sets as well -- you can put a president in a set together with
> integers without problems.

No, I'm not thinking mathematically, I'm thinking concretely. A set that 
contains "everything" is an abstract concept. For concrete problem 
solving, the universe is (nearly always) smaller than "everything". 
It's everything within the problem domain, not everything imaginable.


> > (2) What is len(set.universal())?
>
> Since it was apparently decided that len() can "lie", sys.maxsize
> would be a nice value.  Otherwise, an exception.

Both are problematic.

Firstly, if it raises an exception, then it means all set handling code 
needs to be written (e.g.):

try:
    len(s)
except Exception:
    # handle universal set
    print "What do I do here?"


instead of the simple:

len(s)


The same for anything which iterates over a set, or calls pop() or 
remove(). That's bad -- we're complicating all set-handling code, just 
to make *one* special case a tiny bit not-really-simpler (actually more 
complicated -- the solution with reduce is simpler than the suggested 
idiom).


Secondly, if it returns sys.maxint, that's not as bad, but it's still 
wrong -- sys.maxint is being used as a sentinel, a "magic value", 
rather than actually being the length. If you count the elements that 
are in the set, you get more than the length:


n = 0
U = set.universal()
for i in xrange(-sys.maxint, sys.maxint):
    # this may take a while...
    if i in U:
        n += 1
assert n > len(U)


To me, having len() be accurate is *far* more important than being able 
to supposedly simplify (complexify?) a one-liner using reduce into a 
three-liner using set.universal(). Having a built-in type lie about its 
length gives me the willies.


[...]
> Of course, it could not be an ordinary set object (i.e. one actually
> containing references to its members) -- that would require an
> infinite amount of memory. However, it *can* implement much of the
> set interface without a problem.

Fair enough, but it's the parts that it can't implement which are 
critical. It's not a set, it's something which is nearly a set but is 
supposed to be used in combination with real sets. Any time you write 
code that expects sets, you have to allow for the risk that you might 
be given a not-really-a-set universal set instead.



-- 
Steven D'Aprano



More information about the Python-ideas mailing list