Here some ideas that have been proposed for sets: * New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early-out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away. * Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update(s2); s.update(s3); s.update(s4). * Make sets listenable for changes (proposed by Jason Wells): s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element) # callback Raymond
On 18-May-07, at 6:34 PM, Raymond Hettinger wrote:
Here some ideas that have been proposed for sets:
* New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early- out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away.
+1. Disjointness verification is one of my main uses for set(), and though I don't think that the early-out condition would trigger often in my code, it would increase readability.
* Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update (s2); s.update(s3); s.update(s4).
It's too bad that this couldn't work with the binary operator spelling: s = s1 | s2 | s3 | s4
* Make sets listenable for changes (proposed by Jason Wells):
s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element) # callback
-1 on the base set type: it seems too complex for a base set type. Also, there are various possible semantics that might be desirable, such as receiving the added element, or returning False to prevent addition. The proper place is perhaps a subclass of set with a magic method (analogous to defaultdict/__missing__). -Mike
>> * New method (proposed by Shane Holloway): s1.isdisjoint(s2). Mike> +1. Disjointness verification is one of my main uses for set(), Mike> and though I don't think that the early-out condition would Mike> trigger often in my code, it would increase readability. I think the readbility argument is marginal at best. I use sets frequently and to the greatest extent possible use the builtin operator support because I find that more readable. So for me, I'd be going from if not s1 & s2: to if s1.isdisjoint(s2): I'm not sure that's an improvement. Maybe it's just me, but given two sets I frequently want to operate on s1-s2, s2-s1 and s1&s2 in different ways. I wouldn't find a disjoint operation all that useful. Skip
On Fri, May 18, 2007, Raymond Hettinger wrote:
Here some ideas that have been proposed for sets:
* New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early-out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away.
+1
* Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update(s2); s.update(s3); s.update(s4).
+1
* Make sets listenable for changes (proposed by Jason Wells):
s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element) # callback
-3 -- that's an ugly wart on what is currently a nice, clean object. This seems like a good opportunity for a Cookbook Recipe for a generic listenable proxy class for container objects. Note that I could argue for the callback to be called even when the set doesn't actually change, only when the set is attempted to be changed, which to my mind pushes strongly for a recipe instead of extending sets. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Look, it's your affair if you want to play with five people, but don't go calling it doubles." --John Cleese anticipates Usenet
-----Original Message----- From: python-dev-bounces+castironpi=comcast.net@python.org [mailto:python- dev-bounces+castironpi=comcast.net@python.org] On Behalf Of Raymond Hettinger Sent: Friday, May 18, 2007 8:35 PM To: python-dev@python.org Subject: [Python-Dev] Py2.6 buildouts to the set API
Here some ideas that have been proposed for sets:
* New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early-out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away.
It sounds -really- good.
* Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update(s2); s.update(s3); s.update(s4).
This pleads for elsewhere adding operation in chains. Sort on multiple keys is addressed by itemgetter (IMO also should be built-in). But dict.update, a list append, a deque pop could use these. When-do-you-ever is out of stock and ships in a week.
* Make sets listenable for changes (proposed by Jason Wells):
s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element) # callback
This one calls for subclassing, a la observer pattern. In that vein, some subclassing operation could use a list of pattern-matching / semantic membership. E.g. def every_add_op( self, op, ***data ): call_a_hook( ***data ) op( ***data ) Rings of ML. Support could be provided with def __init__... for x in ( add, update, intersection_update ): def my_x( self, ***data ): call_a_hook( ***data ) x( ***data ) setattr( self, x, my_x ) But you need to know which operations are destructive/constructive, but we can't go back and annotate the whole stdlib. Though I agree that some level of programmatic interference could be useful. Academic concern which shoots 50-50 in the real world. I may be tempered with too much beauty (Beautiful is better than ugly.), not enough market. You're all in it for money.
* New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early-out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away.
I'd rather see iterator versions of the set operations. Then you could do def isempty(i): try: i.next() except StopIteration: return True else: return False if isempty(s1.iter_intersection(s2)): ...
* Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update(s2); s.update(s3); s.update(s4).
I'd rather see this as collections.bigunion.
* Make sets listenable for changes (proposed by Jason Wells):
-1, IAGNI. Martin
* New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to "not s1.intersection(s2)" but has an early-out if a common member is found.
[MvL]
I'd rather see iterator versions of the set operations.
Interesting idea. I'm not sure I see how to make it work. If s|t returned an iterator, then how would s|t|u work? Are you proposing lazy evaluation of unions, intersections, and differences? Raymond
I'd rather see iterator versions of the set operations.
Interesting idea. I'm not sure I see how to make it work. If s|t returned an iterator, then how would s|t|u work?
I don't think s.union(t) should return an iterator, if for no other reason than compatibility. Instead, there might be s.iunion(t) (or s.unioni(t), or s.iterating_union(t)). Then, you could spell s.iunion(t.iunion(u)). iunion is implemented as def iunion(self, other): for o in self: yield o for o in other: if o not in self: yield o So rather than writing x = a.union(b,c,d,e) you could write x = set(a.iunion(b.iunion(c.iunion(d.iunion(e))))) or x = a.union(b.iunion(c.iunion(d.iunion(e)))) Likewise def iintersection(self, other): for o in other: if o in self: yield o
Are you proposing lazy evaluation of unions, intersections, and differences?
I'm not so sure about differences: union and intersections are commutative, so one should typically be able to reorder the evaluation so that the left operand is a true set, and the other is the iterator. For difference, it would be necessary that the right operand is the true set; it couldn't be an iterator, as you need to test whether an element of self is in the other operand. Regards, Martin
On 19/05/2007 3.34, Raymond Hettinger wrote:
* Make sets listenable for changes (proposed by Jason Wells):
s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element) # callback
-1 because I can't see why sets are so specials (compared to other containers or objects) to provide a builtin implementation of the observer pattern. In fact, in my experience, real-world use cases of this pattern often require more attention to details (eg: does the set keep a strong or weak reference to the callback? What if I need to do several *transactional* modifications in a row, and thus would like my callback to be called only once at the end?). -- Giovanni Bajo
participants (7)
-
"Martin v. Löwis"
-
Aahz
-
Aaron Brady
-
Giovanni Bajo
-
Mike Klaas
-
Raymond Hettinger
-
skip@pobox.com