Hi all,
I think the addition of a universal set object would be a nice touch for python sets. Manipulation of collections of sets becomes much easier with it. Right now, certain operations are easy because empty sets are easy to create while the dual operation requires special cases:
set_union = set() for s in sets: set_union |= s
# this doesn't work set_intersection = set(???) for s in sets: set_intersection &= s
Instead you have to do something contorted like:
if len(sets) == 0: set_intersection = set() else: sets_i = iter(sets) set_intersection = sets_i.next() for s in sets: set_intersection &= s
Implementation of a universal set would be pretty trivial. Trails of EasyExtend [1] has an implementation (albeit used a bit differently) that's basically the entirety of what's needed.
The above intersection case would end up looking something like:
set_intersection = set.universal() for s in sets: set_intersection &= s
Thoughts? I'm happy to write the patch myself if people like this idea.
Thanks, Andy Kish.
Andy Kish schrieb:
Instead you have to do something contorted like:
if len(sets) == 0: set_intersection = set() else: sets_i = iter(sets) set_intersection = sets_i.next() for s in sets: set_intersection &= s
Note that a simple ``reduce(operator.and_, sets)`` would suffice for > 1 set.
Georg
Andy Kish wrote: [...]
The above intersection case would end up looking something like:
set_intersection = set.universal() for s in sets: set_intersection &= s
Or even:
set_intersection = reduce(operator.and_, sets, set.universal())
Although, you can already pass multiple (or zero) sets to set.intersection(). So your special case version can be a little simpler...
sets = list(sets) if len(sets) == 0: return set() return sets[0].intersection(sets[1:])
Which isn't as elegant, but it's also not so bad.
-Andrew.
If we want to go golfing :), my favorite solution is with the python 2.6 versions of intersection and union:
set_union = set().union(*sets) set_intersection = set.universal().intersection(*sets)
Folds are nice. That's actually why I sent my initial email to the list. It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection.
Andy.
On Jul 23, 3:12 am, Andrew Bennetts and...@bemusement.org wrote:
Andy Kish wrote:
[...]
The above intersection case would end up looking something like:
set_intersection = set.universal() for s in sets: set_intersection &= s
Or even:
set_intersection = reduce(operator.and_, sets, set.universal())
Although, you can already pass multiple (or zero) sets to set.intersection(). So your special case version can be a little simpler...
sets = list(sets) if len(sets) == 0: return set() return sets[0].intersection(sets[1:])
Which isn't as elegant, but it's also not so bad.
-Andrew.
Python-ideas mailing list Python-id...@python.orghttp://mail.python.org/mailman/listinfo/python-ideas
Andy Kish wrote:
If we want to go golfing :), my favorite solution is with the python 2.6 versions of intersection and union:
set_union = set().union(*sets) set_intersection = set.universal().intersection(*sets)
Folds are nice. That's actually why I sent my initial email to the list. It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection.
Well, not that my opinion counts for much, I'm +0 on it. It's conceptually nice as you say, but I doubt I (or many others) would use it much, and hanging it off the set builtin doesn't feel totally satisfactory to me (but I don't have a better suggestion).
I hope you find some more firmly opinionated responses to your idea. :)
-Andrew.
On Thu, Jul 23, 2009 at 6:13 PM, Greg Ewinggreg.ewing@canterbury.ac.nz wrote:
Andy Kish wrote:
It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection.
What would
for x in set.universal(): ...
do?
Overall it seems like a non-solution to a non-problem. Why don't we add a dict.universal(), list.universal(), etc. while we're at it. Easily a -1.
George
On Thu, 23 Jul 2009 18:29:14 -0400 George Sakkis george.sakkis@gmail.com wrote:
Overall it seems like a non-solution to a non-problem.
It's definitely a real solution. The problem may not be very important, though.
Why don't we add a dict.universal(), list.universal(), etc. while we're at it.
Because a set of "all the things in the universe" makes sense as an entity, whereas dict's or lists of "all the things in the universe" don't. For lists, this implies some sort of ordering on everything in the universe, which we've already abandoned as a bad idea. Dicts, on the other hand, aren't simple collections, but maps, so you don't have _a_ dict of everything in the universe, you have len(universe)**2 of them. The def statement gives us the ability to express such maps.
Which may be how set.universal() stacks up: We can already create a class that has the appropriate behaviors. It's not clear that that is sufficiently useful to justify adding it to the language.
One thing that bugs me is the universal set is a *constant. It's value doesn't change, so it shouldn't be a function or method, but an attribute.
On the other hand, adding negative sets to the language provides an easy way to express that constant, and certainly have more uses in general. If we had to add one of the two, it'd be these. But I'm not convinced that these have enough uses to justify adding them, either.
<mike
Mike Meyer mwm-keyword-python.b4bdba@mired.org:
On the other hand, adding negative sets to the language provides an easy way to express that constant, and certainly have more uses in general. If we had to add one of the two, it'd be these. But I'm not convinced that these have enough uses to justify adding them, either.
Yeah, I feel, that if a group of people need them, they should have a separate type ("negative" or "infitity-minus..." or "universe without..."-) sets, in a separate module (not necessarily in std library).
It would be a type which can interact in some ways with sets (like e.g. dict views do...).
As it was noted -- some sets' features don't make sense here... (len, iter).
But many do:
a in universe_minus() -> True
a in universe_minus([a, b, c]) -> False
set([a, b]) < universe_minus() -> True
set([a, b]) < universe_minus([b, c]) -> False
universe_minus().remove(a) -> universe_minus([a])
universe_minus([a, b, c]).remove(a) -> KeyError
universe_minus([a]).add(a) -> universe_minus([])
universe_minus().add(a) -> universe_minus([])
universe_minus() - set([a, b, c]) -> universe_minus([a, b, c])
universe_minus() & set([a, b, c]) -> set([a, b, c])
universe_minus() | set([a, b, c]) -> universe_minus()
universe_minus() - universe_minus([a, b]) -> set([a, b])
universe_minus() & universe_minus([a, b]) -> universe_minus([a, b])
universe_minus() | universe_minus([a, b]) -> universe_minus()
universe_minus([a, b, c]) - set([a, b]) -> universe_minus([a, b, c])
universe_minus([a, b, c]) | set([a, b]) -> universe_minus([c])
universe_minus([a, b]) ^ set([a, b, c]) -> universe_minus([c])
etc.
Jan Kaliszewski wrote:
Mike Meyer mwm-keyword-python.b4bdba@mired.org:
On the other hand, adding negative sets to the language provides an easy way to express that constant, and certainly have more uses in general. If we had to add one of the two, it'd be these. But I'm not convinced that these have enough uses to justify adding them, either.
Yeah, I feel, that if a group of people need them, they should have a separate type ("negative" or "infitity-minus..." or "universe without..."-) sets, in a separate module (not necessarily in std library).
It should definitely start on PyPI. Since most of the work of methods would be done by the underlying system set (or frozenset) type, a Python implementation should be fast enough.
It would be a type which can interact in some ways with sets (like e.g. dict views do...).
As it was noted -- some sets' features don't make sense here... (len, iter).
But many do:
a in universe_minus() -> True
a in universe_minus([a, b, c]) -> False
set([a, b]) < universe_minus() -> True
set([a, b]) < universe_minus([b, c]) -> False
universe_minus().remove(a) -> universe_minus([a])
universe_minus([a, b, c]).remove(a) -> KeyError
universe_minus([a]).add(a) -> universe_minus([])
universe_minus().add(a) -> universe_minus([])
universe_minus() - set([a, b, c]) -> universe_minus([a, b, c])
universe_minus() & set([a, b, c]) -> set([a, b, c])
universe_minus() | set([a, b, c]) -> universe_minus()
universe_minus() - universe_minus([a, b]) -> set([a, b])
universe_minus() & universe_minus([a, b]) -> universe_minus([a, b])
universe_minus() | universe_minus([a, b]) -> universe_minus()
universe_minus([a, b, c]) - set([a, b]) -> universe_minus([a, b, c])
universe_minus([a, b, c]) | set([a, b]) -> universe_minus([c])
universe_minus([a, b]) ^ set([a, b, c]) -> universe_minus([c])
etc.
Greg Ewing schrieb:
Andy Kish wrote:
It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection.
What would
for x in set.universal(): ...
Since it has to yield all possible Python objects, it could start with the integers. It's not likely to run out of them any time soon
<duck> Georg
Greg Ewing wrote:
Andy Kish wrote:
It's really annoying to having appropriate identity element for union built in while missing the *correct* identity element for intersection.
What would
for x in set.universal(): ...
do?
Raise an exception, just like a signalling NaN in decimal. An expanded set concept with support for complementary set definitions is definitely something that should cook on PyPI for a while though.
Cheers, Nick.
Going back to the original problem...
On Thu, Jul 23, 2009 at 09:12, Andrew Bennettsandrew@bemusement.org wrote:
Andy Kish wrote: [...]
The above intersection case would end up looking something like:
set_intersection = set.universal() for s in sets: set_intersection &= s
Or even:
set_intersection = reduce(operator.and_, sets, set.universal())
Although, you can already pass multiple (or zero) sets to set.intersection(). So your special case version can be a little simpler...
sets = list(sets) if len(sets) == 0: return set() return sets[0].intersection(sets[1:])
Which isn't as elegant, but it's also not so bad.
But set.intersection doesn't need to be called using dot notation, the class attribute call works just as well:
try: return set.intersection(*sets) except TypeError: return set()
I'd say this is as elegant as the OPs solution with the universal set.
Universal sets and other such cofinite sets could be nice, but I don't think they should be in the python standard library.
On Thu, Jul 23, 2009 at 7:28 AM, Andy Kishagkish@gmail.com wrote:
Hi all,
I think the addition of a universal set object would be a nice touch for python sets. Manipulation of collections of sets becomes much easier with it. [...]
What would set.universal() - {1, 2, 3} return? Would this be a ValueError, or would you want to also implement all cofinite sets?
How would 'for x in set.universal()' behave?
What's len(set.universal())?
It seems to me that set.universal() can't be a full-fledged Python set, so it would have to be something else. And then
x = set.universal().intersection(*sets)
will sometimes be returning a true set (i.e., when sets is nonempty), and sometimes the reduced-functionality set.universal(). In many cases you're still going to need to distinguish before doing anything with x.
Not-all-binary-operations-have-to-have-an-identity-ly yours,
-- Mark
On Thu, 23 Jul 2009 04:28:33 pm Andy Kish wrote:
Hi all,
I think the addition of a universal set object would be a nice touch for python sets.
But you can't have a universal set until you know what the problem domain is. If the problem domain is citrus fruits, then the universal set is: {lemons, limes, grapefruits, oranges, mandarins, tangelos, kumquats, ... }
If the problem domain is Presidents of the USA, then the universal set is: {Washington, Lincoln, Bush, Clinton, Obama, Wilson, ... }
If the problem domain is integers, then it is {0, 1, -1, 2, -2, 3, -3, ...}
The first two are finite, the third is infinite.
Manipulation of collections of sets becomes much easier with it. Right now, certain operations are easy because empty sets are easy to create while the dual operation requires special cases:
set_union = set() for s in sets: set_union |= s
Better written as:
reduce(operator.or_, [set([1, 2, 3]), set([2, 4, 5])])
set([1, 2, 3, 4, 5])
# this doesn't work set_intersection = set(???) for s in sets: set_intersection &= s
But this does:
reduce(operator.and_, [set([1, 2, 3]), set([2, 4, 5])])
set([2])
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems.
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
(2) What is len(set.universal())?
(3) What does set.universal() print?
(4) What does set.universal().clear() do?
(5) For that matter, union(), copy(), difference(), issubset(), etc?
(The universal set for strings of length one is a subset of the universal set for all strings.)
I don't think there is a suitable solution for all of these issues. That would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface.
Terry Jones briefly mentioned support for very large sets via complements, which I think would solve many of the issues you bring up.
On Thu, Jul 23, 2009 at 08:26, Steven D'Aprano steve@pearwood.info wrote:
On Thu, 23 Jul 2009 04:28:33 pm Andy Kish wrote:
Hi all,
I think the addition of a universal set object would be a nice touch for python sets.
But you can't have a universal set until you know what the problem domain is. If the problem domain is citrus fruits, then the universal set is: {lemons, limes, grapefruits, oranges, mandarins, tangelos, kumquats, ... }
If the problem domain is Presidents of the USA, then the universal set is: {Washington, Lincoln, Bush, Clinton, Obama, Wilson, ... }
If the problem domain is integers, then it is {0, 1, -1, 2, -2, 3, -3, ...}
The first two are finite, the third is infinite.
The domain is inherently all objects that can be put into a Python set, so the universal set is all Python objects. (I think for the purposes of Python we could gloss over the set-theoretical paradoxiness of "set.universal() in set.universal()" returning True :)
Manipulation of collections of sets becomes much
easier with it. Right now, certain operations are easy because empty sets are easy to create while the dual operation requires special cases:
set_union = set() for s in sets: set_union |= s
Better written as:
reduce(operator.or_, [set([1, 2, 3]), set([2, 4, 5])])
set([1, 2, 3, 4, 5])
# this doesn't work set_intersection = set(???) for s in sets: set_intersection &= s
But this does:
reduce(operator.and_, [set([1, 2, 3]), set([2, 4, 5])])
set([2])
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems.
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
There are two ways of defining what is "in" a set. One is what iter(set.universal()) iterates through, which there is no good answer for. But the other (arguably just as common) way to define what is "in" a set s is "the set of Python objects p such that "p in s" returns True." This latter definition is trivial to implement for the universal set. If we keep track of the complement of a nearly-universal set, that also gives us an easy way to implement in.
(2) What is len(set.universal())?
inf. Perhaps there is a complication since inf is technically a float type? I don't see it being a huge deal. ("Real" universal sets have uncountably many elements, whereas AFAICT floating-point inf is countable; since we have finite-memory systems I can't imagine it makes a difference.)
(3) What does set.universal() print?
Some special-case string; maybe "set(U)" for the universal set, or "set(U-[elements,in,complement])" for general nearly-infinite sets.
(4) What does set.universal().clear() do?
set.universal() doesn't have to be a singleton; it could return a brand new, mutable universal set. So set.universal() returns a new set, and .clear() clears a set regardless of what it contains.
(5) For that matter, union(), copy(), difference(), issubset(), etc?
I think these all fall pretty easily out of the definition of a universal set and a complement representation of nearly infinite sets.
(The universal set for strings of length one is a subset of the
universal set for all strings.)
The universal set is a superset of all of these: it is the set of all Python objects. (Yes, that's paradoxical, but again, I don't think that matters for most use cases in Python.)
I don't think there is a suitable solution for all of these issues. That
would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface.
Storing the complement of an infinite set is pretty straightforward way of implementing sets that are universal except for a few (maybe zero) elements, and can satisfy the entire set interface except for iteration. As you suggest, it would be far trickier to provide general infinite sets, like the set of all integers, but that's not what's being asked for here.
I'm not saying it won't be tricky to implement; it requires changing pretty much all set methods, I've never touched CPython code. I'm also roughly +0 on the issue (not that my vote counts since I lurk far more than I contribute). I'm just pointing out it's possible.
-- Steven D'Aprano _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas
Steven D'Aprano schrieb:
Implementation of a universal set would be pretty trivial.
You think so? I can think of a number of problems.
First, I don't think this is important enough to become a standard Python type. But regardless, these problems are quite easy to answer.
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
Everything. Every possible (i.e. hashable) object is a (virtual) member of the universal set. You're thinking too mathematically here. Python has no notion of a category of elements for ordinary sets as well -- you can put a president in a set together with integers without problems.
(2) What is len(set.universal())?
Since it was apparently decided that len() can "lie", sys.maxsize would be a nice value. Otherwise, an exception.
(3) What does set.universal() print?
'set.universal()'.
(4) What does set.universal().clear() do?
It clears the set. Afterwards, the set is empty.
(5) For that matter, union(), copy(), difference(), issubset(), etc?
union() is a no-op, as well as any operation that adds elements. intersection() just returns the "other" set. copy() just returns another instance of the universal set. issubset() - the universal set is the subset of no other set except the universal set.
To make difference() or remove() and pop() possible, one would have to expand the notion to a "universal-except" set which again has a finite number of exceptions.
(The universal set for strings of length one is a subset of the universal set for all strings.)
Again, see the comment for (1).
I don't think there is a suitable solution for all of these issues. That would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface.
Of course, it could not be an ordinary set object (i.e. one actually containing references to its members) -- that would require an infinite amount of memory. However, it *can* implement much of the set interface without a problem.
Georg
On Fri, 24 Jul 2009 02:22:47 am Georg Brandl wrote:
(1) What elements should go into the universal set? Strings, ints, floats, presidents... ?
Everything. Every possible (i.e. hashable) object is a (virtual) member of the universal set. You're thinking too mathematically here. Python has no notion of a category of elements for ordinary sets as well -- you can put a president in a set together with integers without problems.
No, I'm not thinking mathematically, I'm thinking concretely. A set that contains "everything" is an abstract concept. For concrete problem solving, the universe is (nearly always) smaller than "everything". It's everything within the problem domain, not everything imaginable.
(2) What is len(set.universal())?
Since it was apparently decided that len() can "lie", sys.maxsize would be a nice value. Otherwise, an exception.
Both are problematic.
Firstly, if it raises an exception, then it means all set handling code needs to be written (e.g.):
try: len(s) except Exception: # handle universal set print "What do I do here?"
instead of the simple:
len(s)
The same for anything which iterates over a set, or calls pop() or remove(). That's bad -- we're complicating all set-handling code, just to make *one* special case a tiny bit not-really-simpler (actually more complicated -- the solution with reduce is simpler than the suggested idiom).
Secondly, if it returns sys.maxint, that's not as bad, but it's still wrong -- sys.maxint is being used as a sentinel, a "magic value", rather than actually being the length. If you count the elements that are in the set, you get more than the length:
n = 0 U = set.universal() for i in xrange(-sys.maxint, sys.maxint): # this may take a while... if i in U: n += 1 assert n > len(U)
To me, having len() be accurate is *far* more important than being able to supposedly simplify (complexify?) a one-liner using reduce into a three-liner using set.universal(). Having a built-in type lie about its length gives me the willies.
[...]
Of course, it could not be an ordinary set object (i.e. one actually containing references to its members) -- that would require an infinite amount of memory. However, it *can* implement much of the set interface without a problem.
Fair enough, but it's the parts that it can't implement which are critical. It's not a set, it's something which is nearly a set but is supposed to be used in combination with real sets. Any time you write code that expects sets, you have to allow for the risk that you might be given a not-really-a-set universal set instead.
Steven D'Aprano wrote:
Firstly, if it raises an exception, then it means all set handling code needs to be written (e.g.):
try: len(s) except Exception: # handle universal set print "What do I do here?"
instead of the simple:
len(s)
There is a lot of floating point code in the world that will blow up if fed a Nan or an Inf value as input. Given that it is even harder for a set.universal() instance to show up by accident, I don't see that it would be a major problem if most set handling code remained "naive" about the universal set.
That said, I'm still with Georg in not being convinced yet that the mathematical elegance is worth the additional complexity.
Cheers, Nick.
Steven D'Aprano wrote:
I don't think there is a suitable solution for all of these issues. That would mean that set.universal() can't be a real set object, it has to be some sort of not-quite-a-set object that doesn't provide the entire set interface.
That doesn't strike me as all that different from handling NaN and Inf values in floating point mathematics though. Would set.universal() need to be a special object that required special handling in some set algorithms? Yes, it would, but that doesn't make it an invalid idea.
In fact, treating it like a NaN from the decimal module is probably the answer to most of those questions: if a valid answer isn't defined, raise an exception.
The semantics would need to be spelled out clearly however, as would a position on whether or not to add support for complementary sets (i.e. sets that are defined as "set.universal() ^ items_not_in_set").
Also the "universal" or "complementary" character of the set would need to be embodied in mutable state on the set object so that in-place operations such as "s |= set.universal()" can work correctly.
Cheers, Nick.
P.S. I'm not necessarily +1, or even +0, on the idea until it is fleshed out further. I just wanted to point out that there is precedent for including the numerical equivalents of the universal set.