universal set object for use in set manipulation
data:image/s3,"s3://crabby-images/3fefd/3fefd77b4a80756a9efb7d30b8bfa7dc5634bfc6" alt=""
Hi all, I think the addition of a universal set object would be a nice touch for python sets. Manipulation of collections of sets becomes much easier with it. Right now, certain operations are easy because empty sets are easy to create while the dual operation requires special cases: set_union = set() for s in sets: set_union |= s # this doesn't work set_intersection = set(???) for s in sets: set_intersection &= s Instead you have to do something contorted like: if len(sets) == 0: set_intersection = set() else: sets_i = iter(sets) set_intersection = sets_i.next() for s in sets: set_intersection &= s Implementation of a universal set would be pretty trivial. Trails of EasyExtend [1] has an implementation (albeit used a bit differently) that's basically the entirety of what's needed. The above intersection case would end up looking something like: set_intersection = set.universal() for s in sets: set_intersection &= s Thoughts? I'm happy to write the patch myself if people like this idea. Thanks, Andy Kish. [1] http://fiber-space.de/wordpress/?p=322
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Andy Kish schrieb:
Note that a simple ``reduce(operator.and_, sets)`` would suffice for > 1 set. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/f6151/f615148f1b7d543849b628fd45c50228f2b8cf36" alt=""
Andy Kish wrote: [...]
Or even: set_intersection = reduce(operator.and_, sets, set.universal()) Although, you can already pass multiple (or zero) sets to set.intersection(). So your special case version can be a little simpler... sets = list(sets) if len(sets) == 0: return set() return sets[0].intersection(sets[1:]) Which isn't as elegant, but it's also not so bad. -Andrew.
data:image/s3,"s3://crabby-images/3fefd/3fefd77b4a80756a9efb7d30b8bfa7dc5634bfc6" alt=""
If we want to go golfing :), my favorite solution is with the python 2.6 versions of intersection and union: set_union = set().union(*sets) set_intersection = set.universal().intersection(*sets) Folds are nice. That's actually why I sent my initial email to the list. It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection. Andy. On Jul 23, 3:12 am, Andrew Bennetts <and...@bemusement.org> wrote:
data:image/s3,"s3://crabby-images/f6151/f615148f1b7d543849b628fd45c50228f2b8cf36" alt=""
Andy Kish wrote:
Well, not that my opinion counts for much, I'm +0 on it. It's conceptually nice as you say, but I doubt I (or many others) would use it much, and hanging it off the set builtin doesn't feel totally satisfactory to me (but I don't have a better suggestion). I hope you find some more firmly opinionated responses to your idea. :) -Andrew.
data:image/s3,"s3://crabby-images/e7a68/e7a68048189999ad617aee9737f4da6d6422d9e9" alt=""
On Thu, 23 Jul 2009 18:29:14 -0400 George Sakkis <george.sakkis@gmail.com> wrote:
Overall it seems like a non-solution to a non-problem.
It's definitely a real solution. The problem may not be very important, though.
Why don't we add a dict.universal(), list.universal(), etc. while we're at it.
Because a set of "all the things in the universe" makes sense as an entity, whereas dict's or lists of "all the things in the universe" don't. For lists, this implies some sort of ordering on everything in the universe, which we've already abandoned as a bad idea. Dicts, on the other hand, aren't simple collections, but maps, so you don't have _a_ dict of everything in the universe, you have len(universe)**2 of them. The def statement gives us the ability to express such maps. Which may be how set.universal() stacks up: We can already create a class that has the appropriate behaviors. It's not clear that that is sufficiently useful to justify adding it to the language. One thing that bugs me is the universal set is a *constant. It's value doesn't change, so it shouldn't be a function or method, but an attribute. On the other hand, adding negative sets to the language provides an easy way to express that constant, and certainly have more uses in general. If we had to add one of the two, it'd be these. But I'm not convinced that these have enough uses to justify adding them, either. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
data:image/s3,"s3://crabby-images/4f305/4f30562f209d0539c156fdbf2946fd262f812439" alt=""
Mike Meyer <mwm-keyword-python.b4bdba@mired.org>:
Yeah, I feel, that if a group of people need them, they should have a separate type ("negative" or "infitity-minus..." or "universe without..."-) sets, in a separate module (not necessarily in std library). It would be a type which can interact in some ways with sets (like e.g. dict views do...). As it was noted -- some sets' features don't make sense here... (len, iter). But many do: a in universe_minus() -> True a in universe_minus([a, b, c]) -> False set([a, b]) < universe_minus() -> True set([a, b]) < universe_minus([b, c]) -> False universe_minus().remove(a) -> universe_minus([a]) universe_minus([a, b, c]).remove(a) -> KeyError universe_minus([a]).add(a) -> universe_minus([]) universe_minus().add(a) -> universe_minus([]) universe_minus() - set([a, b, c]) -> universe_minus([a, b, c]) universe_minus() & set([a, b, c]) -> set([a, b, c]) universe_minus() | set([a, b, c]) -> universe_minus() universe_minus() - universe_minus([a, b]) -> set([a, b]) universe_minus() & universe_minus([a, b]) -> universe_minus([a, b]) universe_minus() | universe_minus([a, b]) -> universe_minus() universe_minus([a, b, c]) - set([a, b]) -> universe_minus([a, b, c]) universe_minus([a, b, c]) | set([a, b]) -> universe_minus([c]) universe_minus([a, b]) ^ set([a, b, c]) -> universe_minus([c]) etc.
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Greg Ewing schrieb:
Since it has to yield all possible Python objects, it could start with the integers. It's not likely to run out of them any time soon <duck> Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Greg Ewing wrote:
Raise an exception, just like a signalling NaN in decimal. An expanded set concept with support for complementary set definitions is definitely something that should cook on PyPI for a while though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/04a15/04a15ea38069a933a2f206750ea5a926f92889bc" alt=""
Going back to the original problem... On Thu, Jul 23, 2009 at 09:12, Andrew Bennetts<andrew@bemusement.org> wrote:
But set.intersection doesn't need to be called using dot notation, the class attribute call works just as well: try: return set.intersection(*sets) except TypeError: return set() I'd say this is as elegant as the OPs solution with the universal set. Universal sets and other such cofinite sets could be nice, but I don't think they should be in the python standard library.
data:image/s3,"s3://crabby-images/4516d/4516d91f1b7d793789daa021ac7fdd51ed4504c4" alt=""
On Thu, Jul 23, 2009 at 7:28 AM, Andy Kish<agkish@gmail.com> wrote:
What would set.universal() - {1, 2, 3} return? Would this be a ValueError, or would you want to also implement all cofinite sets? How would 'for x in set.universal()' behave? What's len(set.universal())? It seems to me that set.universal() can't be a full-fledged Python set, so it would have to be something else. And then x = set.universal().intersection(*sets) will sometimes be returning a true set (i.e., when sets is nonempty), and sometimes the reduced-functionality set.universal(). In many cases you're still going to need to distinguish before doing anything with x. Not-all-binary-operations-have-to-have-an-identity-ly yours, -- Mark
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Thu, 23 Jul 2009 04:28:33 pm Andy Kish wrote:
But you can't have a universal set until you know what the problem domain is. If the problem domain is citrus fruits, then the universal set is: {lemons, limes, grapefruits, oranges, mandarins, tangelos, kumquats, ... } If the problem domain is Presidents of the USA, then the universal set is: {Washington, Lincoln, Bush, Clinton, Obama, Wilson, ... } If the problem domain is integers, then it is {0, 1, -1, 2, -2, 3, -3, ...} The first two are finite, the third is infinite.
Better written as:
reduce(operator.or_, [set([1, 2, 3]), set([2, 4, 5])]) set([1, 2, 3, 4, 5])
But this does:
reduce(operator.and_, [set([1, 2, 3]), set([2, 4, 5])]) set([2])
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems. (1) What elements should go into the universal set? Strings, ints, floats, presidents... ? (2) What is len(set.universal())? (3) What does set.universal() print? (4) What does set.universal().clear() do? (5) For that matter, union(), copy(), difference(), issubset(), etc? (The universal set for strings of length one is a subset of the universal set for all strings.) I don't think there is a suitable solution for all of these issues. That would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface. -- Steven D'Aprano
data:image/s3,"s3://crabby-images/8b47a/8b47a47bc67b782ee6d276dd62adfbc257056d29" alt=""
Terry Jones briefly mentioned support for very large sets via complements, which I think would solve many of the issues you bring up. On Thu, Jul 23, 2009 at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
The domain is inherently all objects that can be put into a Python set, so the universal set is all Python objects. (I think for the purposes of Python we could gloss over the set-theoretical paradoxiness of "set.universal() in set.universal()" returning True :)
There are two ways of defining what is "in" a set. One is what iter(set.universal()) iterates through, which there is no good answer for. But the other (arguably just as common) way to define what is "in" a set s is "the set of Python objects p such that "p in s" returns True." This latter definition is trivial to implement for the universal set. If we keep track of the complement of a nearly-universal set, that also gives us an easy way to implement in. (2) What is len(set.universal())?
inf. Perhaps there is a complication since inf is technically a float type? I don't see it being a huge deal. ("Real" universal sets have uncountably many elements, whereas AFAICT floating-point inf is countable; since we have finite-memory systems I can't imagine it makes a difference.) (3) What does set.universal() print?
Some special-case string; maybe "set(U)" for the universal set, or "set(U-[elements,in,complement])" for general nearly-infinite sets. (4) What does set.universal().clear() do?
set.universal() doesn't have to be a singleton; it could return a brand new, mutable universal set. So set.universal() returns a new set, and .clear() clears a set regardless of what it contains. (5) For that matter, union(), copy(), difference(), issubset(), etc?
I think these all fall pretty easily out of the definition of a universal set and a complement representation of nearly infinite sets. (The universal set for strings of length one is a subset of the
universal set for all strings.)
The universal set is a superset of all of these: it is the set of all Python objects. (Yes, that's paradoxical, but again, I don't think that matters for most use cases in Python.) I don't think there is a suitable solution for all of these issues. That
Storing the complement of an infinite set is pretty straightforward way of implementing sets that are universal except for a few (maybe zero) elements, and can satisfy the entire set interface except for iteration. As you suggest, it would be far trickier to provide general infinite sets, like the set of all integers, but that's not what's being asked for here. I'm not saying it won't be tricky to implement; it requires changing pretty much all set methods, I've never touched CPython code. I'm also roughly +0 on the issue (not that my vote counts since I lurk far more than I contribute). I'm just pointing out it's possible.
-- It is better to be quotable than to be honest. -Tom Stoppard Borowitz
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Steven D'Aprano schrieb:
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems.
First, I don't think this is important enough to become a standard Python type. But regardless, these problems are quite easy to answer.
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
Everything. Every possible (i.e. hashable) object is a (virtual) member of the universal set. You're thinking too mathematically here. Python has no notion of a category of elements for ordinary sets as well -- you can put a president in a set together with integers without problems.
(2) What is len(set.universal())?
Since it was apparently decided that len() can "lie", sys.maxsize would be a nice value. Otherwise, an exception.
(3) What does set.universal() print?
'set.universal()'.
(4) What does set.universal().clear() do?
It clears the set. Afterwards, the set is empty.
(5) For that matter, union(), copy(), difference(), issubset(), etc?
union() is a no-op, as well as any operation that adds elements. intersection() just returns the "other" set. copy() just returns another instance of the universal set. issubset() - the universal set is the subset of no other set except the universal set. To make difference() or remove() and pop() possible, one would have to expand the notion to a "universal-except" set which again has a finite number of exceptions.
(The universal set for strings of length one is a subset of the universal set for all strings.)
Again, see the comment for (1).
Of course, it could not be an ordinary set object (i.e. one actually containing references to its members) -- that would require an infinite amount of memory. However, it *can* implement much of the set interface without a problem. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, 24 Jul 2009 02:22:47 am Georg Brandl wrote:
No, I'm not thinking mathematically, I'm thinking concretely. A set that contains "everything" is an abstract concept. For concrete problem solving, the universe is (nearly always) smaller than "everything". It's everything within the problem domain, not everything imaginable.
Both are problematic. Firstly, if it raises an exception, then it means all set handling code needs to be written (e.g.): try: len(s) except Exception: # handle universal set print "What do I do here?" instead of the simple: len(s) The same for anything which iterates over a set, or calls pop() or remove(). That's bad -- we're complicating all set-handling code, just to make *one* special case a tiny bit not-really-simpler (actually more complicated -- the solution with reduce is simpler than the suggested idiom). Secondly, if it returns sys.maxint, that's not as bad, but it's still wrong -- sys.maxint is being used as a sentinel, a "magic value", rather than actually being the length. If you count the elements that are in the set, you get more than the length: n = 0 U = set.universal() for i in xrange(-sys.maxint, sys.maxint): # this may take a while... if i in U: n += 1 assert n > len(U) To me, having len() be accurate is *far* more important than being able to supposedly simplify (complexify?) a one-liner using reduce into a three-liner using set.universal(). Having a built-in type lie about its length gives me the willies. [...]
Fair enough, but it's the parts that it can't implement which are critical. It's not a set, it's something which is nearly a set but is supposed to be used in combination with real sets. Any time you write code that expects sets, you have to allow for the risk that you might be given a not-really-a-set universal set instead. -- Steven D'Aprano
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Steven D'Aprano wrote:
There is a lot of floating point code in the world that will blow up if fed a Nan or an Inf value as input. Given that it is even harder for a set.universal() instance to show up by accident, I don't see that it would be a major problem if most set handling code remained "naive" about the universal set. That said, I'm still with Georg in not being convinced yet that the mathematical elegance is worth the additional complexity. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Steven D'Aprano wrote:
That doesn't strike me as all that different from handling NaN and Inf values in floating point mathematics though. Would set.universal() need to be a special object that required special handling in some set algorithms? Yes, it would, but that doesn't make it an invalid idea. In fact, treating it like a NaN from the decimal module is probably the answer to most of those questions: if a valid answer isn't defined, raise an exception. The semantics would need to be spelled out clearly however, as would a position on whether or not to add support for complementary sets (i.e. sets that are defined as "set.universal() ^ items_not_in_set"). Also the "universal" or "complementary" character of the set would need to be embodied in mutable state on the set object so that in-place operations such as "s |= set.universal()" can work correctly. Cheers, Nick. P.S. I'm not necessarily +1, or even +0, on the idea until it is fleshed out further. I just wanted to point out that there is precedent for including the numerical equivalents of the universal set. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Andy Kish schrieb:
Note that a simple ``reduce(operator.and_, sets)`` would suffice for > 1 set. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/f6151/f615148f1b7d543849b628fd45c50228f2b8cf36" alt=""
Andy Kish wrote: [...]
Or even: set_intersection = reduce(operator.and_, sets, set.universal()) Although, you can already pass multiple (or zero) sets to set.intersection(). So your special case version can be a little simpler... sets = list(sets) if len(sets) == 0: return set() return sets[0].intersection(sets[1:]) Which isn't as elegant, but it's also not so bad. -Andrew.
data:image/s3,"s3://crabby-images/3fefd/3fefd77b4a80756a9efb7d30b8bfa7dc5634bfc6" alt=""
If we want to go golfing :), my favorite solution is with the python 2.6 versions of intersection and union: set_union = set().union(*sets) set_intersection = set.universal().intersection(*sets) Folds are nice. That's actually why I sent my initial email to the list. It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection. Andy. On Jul 23, 3:12 am, Andrew Bennetts <and...@bemusement.org> wrote:
data:image/s3,"s3://crabby-images/f6151/f615148f1b7d543849b628fd45c50228f2b8cf36" alt=""
Andy Kish wrote:
Well, not that my opinion counts for much, I'm +0 on it. It's conceptually nice as you say, but I doubt I (or many others) would use it much, and hanging it off the set builtin doesn't feel totally satisfactory to me (but I don't have a better suggestion). I hope you find some more firmly opinionated responses to your idea. :) -Andrew.
data:image/s3,"s3://crabby-images/e7a68/e7a68048189999ad617aee9737f4da6d6422d9e9" alt=""
On Thu, 23 Jul 2009 18:29:14 -0400 George Sakkis <george.sakkis@gmail.com> wrote:
Overall it seems like a non-solution to a non-problem.
It's definitely a real solution. The problem may not be very important, though.
Why don't we add a dict.universal(), list.universal(), etc. while we're at it.
Because a set of "all the things in the universe" makes sense as an entity, whereas dict's or lists of "all the things in the universe" don't. For lists, this implies some sort of ordering on everything in the universe, which we've already abandoned as a bad idea. Dicts, on the other hand, aren't simple collections, but maps, so you don't have _a_ dict of everything in the universe, you have len(universe)**2 of them. The def statement gives us the ability to express such maps. Which may be how set.universal() stacks up: We can already create a class that has the appropriate behaviors. It's not clear that that is sufficiently useful to justify adding it to the language. One thing that bugs me is the universal set is a *constant. It's value doesn't change, so it shouldn't be a function or method, but an attribute. On the other hand, adding negative sets to the language provides an easy way to express that constant, and certainly have more uses in general. If we had to add one of the two, it'd be these. But I'm not convinced that these have enough uses to justify adding them, either. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/consulting.html Independent Network/Unix/Perforce consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
data:image/s3,"s3://crabby-images/4f305/4f30562f209d0539c156fdbf2946fd262f812439" alt=""
Mike Meyer <mwm-keyword-python.b4bdba@mired.org>:
Yeah, I feel, that if a group of people need them, they should have a separate type ("negative" or "infitity-minus..." or "universe without..."-) sets, in a separate module (not necessarily in std library). It would be a type which can interact in some ways with sets (like e.g. dict views do...). As it was noted -- some sets' features don't make sense here... (len, iter). But many do: a in universe_minus() -> True a in universe_minus([a, b, c]) -> False set([a, b]) < universe_minus() -> True set([a, b]) < universe_minus([b, c]) -> False universe_minus().remove(a) -> universe_minus([a]) universe_minus([a, b, c]).remove(a) -> KeyError universe_minus([a]).add(a) -> universe_minus([]) universe_minus().add(a) -> universe_minus([]) universe_minus() - set([a, b, c]) -> universe_minus([a, b, c]) universe_minus() & set([a, b, c]) -> set([a, b, c]) universe_minus() | set([a, b, c]) -> universe_minus() universe_minus() - universe_minus([a, b]) -> set([a, b]) universe_minus() & universe_minus([a, b]) -> universe_minus([a, b]) universe_minus() | universe_minus([a, b]) -> universe_minus() universe_minus([a, b, c]) - set([a, b]) -> universe_minus([a, b, c]) universe_minus([a, b, c]) | set([a, b]) -> universe_minus([c]) universe_minus([a, b]) ^ set([a, b, c]) -> universe_minus([c]) etc.
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Greg Ewing schrieb:
Since it has to yield all possible Python objects, it could start with the integers. It's not likely to run out of them any time soon <duck> Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Greg Ewing wrote:
Raise an exception, just like a signalling NaN in decimal. An expanded set concept with support for complementary set definitions is definitely something that should cook on PyPI for a while though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/04a15/04a15ea38069a933a2f206750ea5a926f92889bc" alt=""
Going back to the original problem... On Thu, Jul 23, 2009 at 09:12, Andrew Bennetts<andrew@bemusement.org> wrote:
But set.intersection doesn't need to be called using dot notation, the class attribute call works just as well: try: return set.intersection(*sets) except TypeError: return set() I'd say this is as elegant as the OPs solution with the universal set. Universal sets and other such cofinite sets could be nice, but I don't think they should be in the python standard library.
data:image/s3,"s3://crabby-images/4516d/4516d91f1b7d793789daa021ac7fdd51ed4504c4" alt=""
On Thu, Jul 23, 2009 at 7:28 AM, Andy Kish<agkish@gmail.com> wrote:
What would set.universal() - {1, 2, 3} return? Would this be a ValueError, or would you want to also implement all cofinite sets? How would 'for x in set.universal()' behave? What's len(set.universal())? It seems to me that set.universal() can't be a full-fledged Python set, so it would have to be something else. And then x = set.universal().intersection(*sets) will sometimes be returning a true set (i.e., when sets is nonempty), and sometimes the reduced-functionality set.universal(). In many cases you're still going to need to distinguish before doing anything with x. Not-all-binary-operations-have-to-have-an-identity-ly yours, -- Mark
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Thu, 23 Jul 2009 04:28:33 pm Andy Kish wrote:
But you can't have a universal set until you know what the problem domain is. If the problem domain is citrus fruits, then the universal set is: {lemons, limes, grapefruits, oranges, mandarins, tangelos, kumquats, ... } If the problem domain is Presidents of the USA, then the universal set is: {Washington, Lincoln, Bush, Clinton, Obama, Wilson, ... } If the problem domain is integers, then it is {0, 1, -1, 2, -2, 3, -3, ...} The first two are finite, the third is infinite.
Better written as:
reduce(operator.or_, [set([1, 2, 3]), set([2, 4, 5])]) set([1, 2, 3, 4, 5])
But this does:
reduce(operator.and_, [set([1, 2, 3]), set([2, 4, 5])]) set([2])
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems. (1) What elements should go into the universal set? Strings, ints, floats, presidents... ? (2) What is len(set.universal())? (3) What does set.universal() print? (4) What does set.universal().clear() do? (5) For that matter, union(), copy(), difference(), issubset(), etc? (The universal set for strings of length one is a subset of the universal set for all strings.) I don't think there is a suitable solution for all of these issues. That would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface. -- Steven D'Aprano
data:image/s3,"s3://crabby-images/8b47a/8b47a47bc67b782ee6d276dd62adfbc257056d29" alt=""
Terry Jones briefly mentioned support for very large sets via complements, which I think would solve many of the issues you bring up. On Thu, Jul 23, 2009 at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
The domain is inherently all objects that can be put into a Python set, so the universal set is all Python objects. (I think for the purposes of Python we could gloss over the set-theoretical paradoxiness of "set.universal() in set.universal()" returning True :)
There are two ways of defining what is "in" a set. One is what iter(set.universal()) iterates through, which there is no good answer for. But the other (arguably just as common) way to define what is "in" a set s is "the set of Python objects p such that "p in s" returns True." This latter definition is trivial to implement for the universal set. If we keep track of the complement of a nearly-universal set, that also gives us an easy way to implement in. (2) What is len(set.universal())?
inf. Perhaps there is a complication since inf is technically a float type? I don't see it being a huge deal. ("Real" universal sets have uncountably many elements, whereas AFAICT floating-point inf is countable; since we have finite-memory systems I can't imagine it makes a difference.) (3) What does set.universal() print?
Some special-case string; maybe "set(U)" for the universal set, or "set(U-[elements,in,complement])" for general nearly-infinite sets. (4) What does set.universal().clear() do?
set.universal() doesn't have to be a singleton; it could return a brand new, mutable universal set. So set.universal() returns a new set, and .clear() clears a set regardless of what it contains. (5) For that matter, union(), copy(), difference(), issubset(), etc?
I think these all fall pretty easily out of the definition of a universal set and a complement representation of nearly infinite sets. (The universal set for strings of length one is a subset of the
universal set for all strings.)
The universal set is a superset of all of these: it is the set of all Python objects. (Yes, that's paradoxical, but again, I don't think that matters for most use cases in Python.) I don't think there is a suitable solution for all of these issues. That
Storing the complement of an infinite set is pretty straightforward way of implementing sets that are universal except for a few (maybe zero) elements, and can satisfy the entire set interface except for iteration. As you suggest, it would be far trickier to provide general infinite sets, like the set of all integers, but that's not what's being asked for here. I'm not saying it won't be tricky to implement; it requires changing pretty much all set methods, I've never touched CPython code. I'm also roughly +0 on the issue (not that my vote counts since I lurk far more than I contribute). I'm just pointing out it's possible.
-- It is better to be quotable than to be honest. -Tom Stoppard Borowitz
data:image/s3,"s3://crabby-images/efe4b/efe4bed0c2a0c378057d3a32de1b9bcc193bea5e" alt=""
Steven D'Aprano schrieb:
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems.
First, I don't think this is important enough to become a standard Python type. But regardless, these problems are quite easy to answer.
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
Everything. Every possible (i.e. hashable) object is a (virtual) member of the universal set. You're thinking too mathematically here. Python has no notion of a category of elements for ordinary sets as well -- you can put a president in a set together with integers without problems.
(2) What is len(set.universal())?
Since it was apparently decided that len() can "lie", sys.maxsize would be a nice value. Otherwise, an exception.
(3) What does set.universal() print?
'set.universal()'.
(4) What does set.universal().clear() do?
It clears the set. Afterwards, the set is empty.
(5) For that matter, union(), copy(), difference(), issubset(), etc?
union() is a no-op, as well as any operation that adds elements. intersection() just returns the "other" set. copy() just returns another instance of the universal set. issubset() - the universal set is the subset of no other set except the universal set. To make difference() or remove() and pop() possible, one would have to expand the notion to a "universal-except" set which again has a finite number of exceptions.
(The universal set for strings of length one is a subset of the universal set for all strings.)
Again, see the comment for (1).
Of course, it could not be an ordinary set object (i.e. one actually containing references to its members) -- that would require an infinite amount of memory. However, it *can* implement much of the set interface without a problem. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, 24 Jul 2009 02:22:47 am Georg Brandl wrote:
No, I'm not thinking mathematically, I'm thinking concretely. A set that contains "everything" is an abstract concept. For concrete problem solving, the universe is (nearly always) smaller than "everything". It's everything within the problem domain, not everything imaginable.
Both are problematic. Firstly, if it raises an exception, then it means all set handling code needs to be written (e.g.): try: len(s) except Exception: # handle universal set print "What do I do here?" instead of the simple: len(s) The same for anything which iterates over a set, or calls pop() or remove(). That's bad -- we're complicating all set-handling code, just to make *one* special case a tiny bit not-really-simpler (actually more complicated -- the solution with reduce is simpler than the suggested idiom). Secondly, if it returns sys.maxint, that's not as bad, but it's still wrong -- sys.maxint is being used as a sentinel, a "magic value", rather than actually being the length. If you count the elements that are in the set, you get more than the length: n = 0 U = set.universal() for i in xrange(-sys.maxint, sys.maxint): # this may take a while... if i in U: n += 1 assert n > len(U) To me, having len() be accurate is *far* more important than being able to supposedly simplify (complexify?) a one-liner using reduce into a three-liner using set.universal(). Having a built-in type lie about its length gives me the willies. [...]
Fair enough, but it's the parts that it can't implement which are critical. It's not a set, it's something which is nearly a set but is supposed to be used in combination with real sets. Any time you write code that expects sets, you have to allow for the risk that you might be given a not-really-a-set universal set instead. -- Steven D'Aprano
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Steven D'Aprano wrote:
There is a lot of floating point code in the world that will blow up if fed a Nan or an Inf value as input. Given that it is even harder for a set.universal() instance to show up by accident, I don't see that it would be a major problem if most set handling code remained "naive" about the universal set. That said, I'm still with Georg in not being convinced yet that the mathematical elegance is worth the additional complexity. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
data:image/s3,"s3://crabby-images/eac55/eac5591fe952105aa6b0a522d87a8e612b813b5f" alt=""
Steven D'Aprano wrote:
That doesn't strike me as all that different from handling NaN and Inf values in floating point mathematics though. Would set.universal() need to be a special object that required special handling in some set algorithms? Yes, it would, but that doesn't make it an invalid idea. In fact, treating it like a NaN from the decimal module is probably the answer to most of those questions: if a valid answer isn't defined, raise an exception. The semantics would need to be spelled out clearly however, as would a position on whether or not to add support for complementary sets (i.e. sets that are defined as "set.universal() ^ items_not_in_set"). Also the "universal" or "complementary" character of the set would need to be embodied in mutable state on the set object so that in-place operations such as "s |= set.universal()" can work correctly. Cheers, Nick. P.S. I'm not necessarily +1, or even +0, on the idea until it is fleshed out further. I just wanted to point out that there is precedent for including the numerical equivalents of the universal set. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (14)
-
Andrew Bennetts
-
Andy Kish
-
Benjamin Peterson
-
David Borowitz
-
Georg Brandl
-
George Sakkis
-
Greg Ewing
-
Jan Kaliszewski
-
Jan Kanis
-
Mark Dickinson
-
Mike Meyer
-
Nick Coghlan
-
Steven D'Aprano
-
Terry Reedy