[Python-ideas] universal set object for use in set manipulation
Steven D'Aprano
steve at pearwood.info
Thu Jul 23 20:00:11 CEST 2009
On Fri, 24 Jul 2009 02:22:47 am Georg Brandl wrote:
> > (1) What elements should go into the universal set? Strings, ints,
> > floats, presidents... ?
>
> Everything. Every possible (i.e. hashable) object is a (virtual)
> member of the universal set. You're thinking too mathematically
> here. Python has no notion of a category of elements for ordinary
> sets as well -- you can put a president in a set together with
> integers without problems.
No, I'm not thinking mathematically, I'm thinking concretely. A set that
contains "everything" is an abstract concept. For concrete problem
solving, the universe is (nearly always) smaller than "everything".
It's everything within the problem domain, not everything imaginable.
> > (2) What is len(set.universal())?
>
> Since it was apparently decided that len() can "lie", sys.maxsize
> would be a nice value. Otherwise, an exception.
Both are problematic.
Firstly, if it raises an exception, then it means all set handling code
needs to be written (e.g.):
try:
len(s)
except Exception:
# handle universal set
print "What do I do here?"
instead of the simple:
len(s)
The same for anything which iterates over a set, or calls pop() or
remove(). That's bad -- we're complicating all set-handling code, just
to make *one* special case a tiny bit not-really-simpler (actually more
complicated -- the solution with reduce is simpler than the suggested
idiom).
Secondly, if it returns sys.maxint, that's not as bad, but it's still
wrong -- sys.maxint is being used as a sentinel, a "magic value",
rather than actually being the length. If you count the elements that
are in the set, you get more than the length:
n = 0
U = set.universal()
for i in xrange(-sys.maxint, sys.maxint):
# this may take a while...
if i in U:
n += 1
assert n > len(U)
To me, having len() be accurate is *far* more important than being able
to supposedly simplify (complexify?) a one-liner using reduce into a
three-liner using set.universal(). Having a built-in type lie about its
length gives me the willies.
[...]
> Of course, it could not be an ordinary set object (i.e. one actually
> containing references to its members) -- that would require an
> infinite amount of memory. However, it *can* implement much of the
> set interface without a problem.
Fair enough, but it's the parts that it can't implement which are
critical. It's not a set, it's something which is nearly a set but is
supposed to be used in combination with real sets. Any time you write
code that expects sets, you have to allow for the risk that you might
be given a not-really-a-set universal set instead.
--
Steven D'Aprano
More information about the Python-ideas
mailing list