Re: [Python-ideas] Add orderedset as set(iterable, *, ordered=False) and similarly for frozenset.
On Fri Feb 06 2015 at 01:17:03 Nathaniel Smith
Surely a.union(b) should produce the same result as a.update(b) except out-of-place, and .update() already has well-defined semantics for OrderedDicts.
I don't think they are equivalent. update() makes sense because it's described as a procedure mutating one operand - among other implications, it fundamentally doesn't commute. On the other hand, set union is commutative, and changing that seems strange (and wrong) to me. I think it'd be more sensible to add .extend() and + operations to ordered sets if that's a useful thing to do to them. It mirrors the fact that they are (more or less) mutable sequences, and it's more obvious what the order of the result is (and + on sequences is already non-commutative). In any case, I'm evidently wrong about this being a trivial decision. :) Ed Kellett
On Sun, Feb 8, 2015 at 9:22 AM, Ed Kellett
On the other hand, set union is commutative, and changing that seems strange (and wrong) to me.
Set union is, but sets lack order. Once you add order to a set, it becomes more like a list, so the union of x and y might not be identical to the union of y and x. They'd be the same modulo order, so it's not breaking the concept of set union. ChrisA
On Sat Feb 07 2015 at 10:37:39 PM Chris Angelico
Set union is, but sets lack order. Once you add order to a set, it becomes more like a list
Then wouldn't adding .extend() and + make more sense anyway? List-like operations should look like the list-like operations we already have.
so the union of x and y might not be identical to the union of y and x
Well, not if you define union to be non-commutative, no—but that's what we're discussing.
They'd be the same modulo order, so it's not breaking the concept of set union.
Sure it is: operations are supposed to yield things that make sense. Set union yields the set of elements contained in any of its operands; even if its operands have orderings that make sense, the ordering between sets might be partial or inconsistent, so the result cannot. So the existing definition of set union can never produce a correct ordering. I can't see why it would be preferable to redefine union as concatenation rather than just supporting explicit concatenation. Apart from feeling more mathematically sound to me, it's better at documenting intention: which method (or operator) you choose reflects whether you're interested in the order of the result or just its contents. Ed Kellett
On Sat, Feb 7, 2015 at 6:02 PM, Ed Kellett
On Sat Feb 07 2015 at 10:37:39 PM Chris Angelico
wrote: Set union is, but sets lack order. Once you add order to a set, it becomes more like a list
Then wouldn't adding .extend() and + make more sense anyway? List-like operations should look like the list-like operations we already have.
so the union of x and y might not be identical to the union of y and x
Well, not if you define union to be non-commutative, no—but that's what we're discussing.
They'd be the same modulo order, so it's not breaking the concept of set union.
Sure it is: operations are supposed to yield things that make sense. Set union yields the set of elements contained in any of its operands; even if its operands have orderings that make sense, the ordering between sets might be partial or inconsistent, so the result cannot. So the existing definition of set union can never produce a correct ordering.
I can't see why it would be preferable to redefine union as concatenation rather than just supporting explicit concatenation. Apart from feeling more mathematically sound to me, it's better at documenting intention: which method (or operator) you choose reflects whether you're interested in the order of the result or just its contents.
The reason is that that's already what update does, which is the most common way to use OrderedSet. Typically, you have a bunch of things and you want a set of unique elements in the order they were added. So you create your accumulator and then update it with iterables of elements, after which it contains your desired ordered set of unique elements. Maybe your code looks like this: a = OrderedSet() a.update(b) a.update(c) a.update(d) Why shouldn't that be the same as a | b | c | d ? I think it should and that in general union should be equivalent in effect to copy and extend. Best, Neil
Ed Kellett
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/4WZbMC2pNe0/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/4WZbMC2pNe0/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
On Sat Feb 07 2015 at 11:12:02 PM Neil Girdhar
Why shouldn't that be the same as
a | b | c | d
?
I think it should and that in general union should be equivalent in effect to copy and extend.
Because yielding an ordering doesn't have any basis in the definition of union, and it's not any easier to write than a + b + c + d. "copy and extend" is just concatenation - is there any reason *not* to use the concatenation operator for it, rather than the union? Ed Kellett
On Feb 7, 2015, at 15:24, Ed Kellett
On Sat Feb 07 2015 at 11:12:02 PM Neil Girdhar
wrote: Why shouldn't that be the same as
a | b | c | d
?
I think it should and that in general union should be equivalent in effect to copy and extend.
Because yielding an ordering doesn't have any basis in the definition of union, and it's not any easier to write than a + b + c + d. "copy and extend" is just concatenation - is there any reason *not* to use the concatenation operator for it, rather than the union?
In a+b, I'd expect that all of the elements of b will appear in the result in the same order they appear in b. That's how it works for sequences. But it can't work that way for ordered sets if there are any elements of b also in a. More importantly, that's the entire point of ordered sets. If you wanted to preserve (or didn't care about) duplicates, you'd just use a list. What you actually want here is an "unsorted union"--all the elements of a in order, then all the elements of b except those also in a in order, and so on. (That's "unsorted" as opposed to a sorted union, where the sequences are interleaved in such a way as to preserve a shared ordering rule.) It seems a lot more natural to call that "union" than concatenate. But it might be better to explicitly call it "unsorted union", or even to just not provide it. (IIRC, in Mathematica, you do it by concatenating the sets into a list, then calling DeleteDuplicates to turn the result back into an ordered set, or by calling Tally on the concatenated list and selecting the ones whose tally is 1, or various other equivalents.) The fact that unordered union isn't commutative doesn't mean it's not a reasonable extension of union. The best-known extension of union is probably the disjoint union, which also isn't commutative: {1, 2} U* {0, 1} = {(1,0), (2,0), (0,1), (1,2)} but {0, 1} U* {1, 2} = {(0, 0), (1, 0), (1, 1), (2, 1)}.
You make a good point, but I'm still unfortunately unconvinced. If it were
called UniqueList, I would agree with you. Since it's called OrderedSet, I
expect it to use the same operations as set. Also, typically I implement
something using set and then after the fact realize that I need ordering.
I then change the declaration to OrderedSet. With your solution, I would
have to also change | and |= to + and +=. I feel that update should remain
equivalent to |= rather than switching to +=,
Best,
Neil
On Sat, Feb 7, 2015 at 6:24 PM, Ed Kellett
On Sat Feb 07 2015 at 11:12:02 PM Neil Girdhar
wrote: Why shouldn't that be the same as
a | b | c | d
?
I think it should and that in general union should be equivalent in effect to copy and extend.
Because yielding an ordering doesn't have any basis in the definition of union, and it's not any easier to write than a + b + c + d. "copy and extend" is just concatenation - is there any reason *not* to use the concatenation operator for it, rather than the union?
Ed Kellett
On Sun Feb 08 2015 at 12:18:41 AM Andrew Barnert
In a+b, I'd expect that all of the elements of b will appear in the result in the same order they appear in b. That's how it works for sequences. But it can't work that way for ordered sets if there are any elements of b also in a.
Well, a + b would add elements of b to a in the order they appear in b—they just wouldn't appear there.
More importantly, that's the entire point of ordered sets. If you wanted to preserve (or didn't care about) duplicates, you'd just use a list.
And that's why a + b behaving that way makes sense. To me.
What you actually want here is an "unsorted union"--all the elements of a in order, then all the elements of b except those also in a in order, and so on. (That's "unsorted" as opposed to a sorted union, where the sequences are interleaved in such a way as to preserve a shared ordering rule.) It seems a lot more natural to call that "union" than concatenate. But it might be better to explicitly call it "unsorted union", or even to just not provide it. (IIRC, in Mathematica, you do it by concatenating the sets into a list, then calling DeleteDuplicates to turn the result back into an ordered set, or by calling Tally on the concatenated list and selecting the ones whose tally is 1, or various other equivalents.)
Well, I don't really understand why it seems natural to call it union rather than concatenate - it is appending each element in order, which is exactly what concatenation is. I think this might just be a fundamental difference between our mental models of sets and union, though, so there's probably little point arguing about it. I think "unsorted union" is reasonable.
The fact that unordered union isn't commutative doesn't mean it's not a reasonable extension of union. The best-known extension of union is probably the disjoint union, which also isn't commutative: {1, 2} U* {0, 1} = {(1,0), (2,0), (0,1), (1,2)} but {0, 1} U* {1, 2} = {(0, 0), (1, 0), (1, 1), (2, 1)}.
But disjoint union has even less in common with union than unsorted union, and I don't think anybody drops the "disjoint" and conflates it with regular union. Ed Kellett
On Sun, Feb 8, 2015 at 11:34 AM, Neil Girdhar
You make a good point, but I'm still unfortunately unconvinced. If it were called UniqueList, I would agree with you. Since it's called OrderedSet, I expect it to use the same operations as set.
To be quite honest, I would be very happy if sets had more of the operations that dictionaries have. Imagine if "set" were "dictionary where all the values are True", and "OrderedSet" were "OrderedDict where all the values are True"; combining sets is like combining dictionaries, but you don't have to worry about merging the values. ISTM ordered set addition/union should be like list concatenation with duplicates dropped. But then, I'm not a mathematician; which, come to think of it, is why I use "Practicality beats purity" Python rather than one of the more mathematically-pure functional languages. ChrisA
On Sun Feb 08 2015 at 12:34:48 AM Neil Girdhar
If it were called UniqueList, I would agree with you. Since it's called OrderedSet, I expect it to use the same operations as set.
I agree, and with that in mind, it seems wrong for it to redefine what "union" means.
Also, typically I implement something using set and then after the fact realize that I need ordering. I then change the declaration to OrderedSet. With your solution, I would have to also change | and |= to + and +=. I feel that update should remain equivalent to |= rather than switching to +=,
That's a convenient case if you happened to |= other sets (which you also had to change to OrderedSet) in in the correct order already, but if you didn't or you wanted some particular interesting ordering or you were getting your sets to add from somewhere that didn't deal in OrderedSets, you'd have to change that part of your program to support ordering anyway. On the other hand, if you were .update()ing directly from some other ordered iterable, you wouldn't have to change anything. :) Ed Kellett
On Sun Feb 08 2015 at 12:52:56 AM Chris Angelico
To be quite honest, I would be very happy if sets had more of the operations that dictionaries have. Imagine if "set" were "dictionary where all the values are True", and "OrderedSet" were "OrderedDict where all the values are True"; combining sets is like combining dictionaries, but you don't have to worry about merging the values.
I think they already have most of the useful ones. Adding True values seems unnecessary, to me—that entries have no value seems more appropriate in terms of merging them making sense.
ISTM ordered set addition/union should be like list concatenation with duplicates dropped.
I agree about addition - that'd mirror list concatenation nicely. But with that in mind, making union do the same thing seems wrong - partly because it'd be an exact duplicate of other functionality, and partly because it's a special case: - `set | set` has no meaningful ordering - `orderedset | set` has no meaningful ordering - `set | orderedset` has no meaningful ordering - `orderedset | orderedset` has a meaningful ordering if you expect union to act like update(). Maybe that's more common than I thought; I always considered them fundamentally different concepts. Ed Kellett
On Sun, Feb 8, 2015 at 12:07 PM, Ed Kellett
On Sun Feb 08 2015 at 12:52:56 AM Chris Angelico
wrote: To be quite honest, I would be very happy if sets had more of the operations that dictionaries have. Imagine if "set" were "dictionary where all the values are True", and "OrderedSet" were "OrderedDict where all the values are True"; combining sets is like combining dictionaries, but you don't have to worry about merging the values.
I think they already have most of the useful ones. Adding True values seems unnecessary, to me—that entries have no value seems more appropriate in terms of merging them making sense.
I don't know that *actually* having a True value for everything would necessarily help, but in terms of defining operations like union and intersection, it should be broadly equivalent to solving the same problem with a dict/OrderedDict. ChrisA
Chris Angelico writes:
ISTM ordered set addition/union should be like list concatenation with duplicates dropped.
You can't have that, because you lose the order of common elements that are ordered differently in some of the operands. I have no idea why you would want a more or less "symmetric" combination of ordered sets to ignore the order of elements in any of the operands. (I write "symmetric" because I expect the collections in addition or union, especially union, to be treated "similarly" even if the operations aren't strictly commutative.) I could see union defined as "forget the orders and return the union as sets" but that would surely surprise many users of ordered sets. Set addition has a mathematical definition, as the "disjoint union" (coproduct, sort of a multiset that remembers that duplicates came from different places). "Practicality before purity" suggests that most Python programmers won't know what a disjoint union is (and probably have never heard of "coproduct"), so I'm only -0.5 on using the "+" operator for this operation, but I still don't see a "canonical" use case. I don't have a problem with defining update any way that a reasonably large plurality of use cases find useful (Andrew's "unsorted but ordered union" seems reasonable), but it should be called "update" or "merge" (by analogy with dictionaries or sequences, respectively). ... but don't ask me to come up with the canonical use case, because I can't think of any at all.
participants (5)
-
Andrew Barnert
-
Chris Angelico
-
Ed Kellett
-
Neil Girdhar
-
Stephen J. Turnbull