API for binary operations on Sets
I would like to solicit this group's thoughts on how to reconcile the Set abstract base class with the API for built-in set objects (see http://bugs.python.org/issue8743 ). I've been thinking about this issue for a good while and the RightThingToDo(tm) isn't clear. Here's the situation: Binary operators for the built-in set object restrict their "other" argument to instances of set, frozenset, or one of their subclasses. Otherwise, they return NotImplemented. This design was intentional (i.e. part of the original pure python version, it is unittested behavior, and it is a documented restriction). It allows other classes to "see" the NotImplemented and have a chance to take-over using __ror__, __rand__, etc. Also, by not accepting any iterable, it prevents little coding atrocities or possible mistakes like "s | 'abc'". This is a break with what is done for lists (Guido has previously lamented that list.__add__ accepting any iterable is one of his "regrets"). This design has been in place for several years and so far everyone has been happy with it (no bug reports, feature requests, or discussions on the newsgroup, etc). If someone needed to process a non-set iterable, the named set methods (like intersection, update, etc) all accept any iterable value and this provides an immediate, usable alternative. In contrast, the Set and MutableSet abstract base classes in Lib/_abcoll.py take a different approach. They specify that something claiming to be set-like will accept any-iterable for a binary operator (IOW, the builtin set object does not comply). The provided mixins (such as __or__, __and__, etc) are implemented that way and it works fine. Also, the Set and MutableSet API do not provide named methods such as update, intersection, difference, etc. They aren't really needed because the operator methods already provide the functionality and because it keeps the Set API to a reasonable minimum. All of this it well and good, but the two don't interoperate. You can't get an instance of the Set ABC to work with a regular set, nor do regular sets comply with the ABC. These are problems because they defeat some of the design goals for ABCs. We have a few options: 1. Liberalize setobject.c binary operator methods to accept anything registered to the Set ABC and add a backwards incompatible restriction to the Set ABC binary operator methods to only accept Set ABC instances (they currently accept any iterable). This approach has a backwards incompatible tightening of the Set ABC, but that will probably affect very few people. It also has the disadvantage of not providing a straight-forward way to handle general iterable arguments (either the implementer needs to write named binary methods like update, difference, etc for that purpose or the user will need to cast the the iterable to a set before operating on it). The positive side of this option is that keeps the current advantages of the setobject API and its NotImplemented return value. 1a. Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable. 2. We could liberalize builtin set objects to accept any iterable as an "other" argument to a binary set operator. This choice is not entirely backwards compatible because it would break code depending on being able run __ror__, __rand__, etc after a NotImplemented value is returned. That being said, I think it unlikely that such code exists. The real disadvantage is that it replicates the problems with list.__add__ and Guido has said before that he doesn't want to do that again. I was leaning towards #1 or #1a and the guys on IRC thought #2 would be better. Now I'm not sure and would like additional input so I can get this bug closed for 3.2. Any thoughts on the subject would be appreciated. Thanks, Raymond P.S. I also encountered a small difficulty in implementing #2 that would still need to be resolved if that option is chosen.
On Wed, Sep 29, 2010 at 8:50 PM, Raymond Hettinger
I would like to solicit this group's thoughts on how to reconcile the Set abstract base class with the API for built-in set objects (see http://bugs.python.org/issue8743 ). I've been thinking about this issue for a good while and the RightThingToDo(tm) isn't clear. Here's the situation: Binary operators for the built-in set object restrict their "other" argument to instances of set, frozenset, or one of their subclasses. Otherwise, they return NotImplemented. This design was intentional (i.e. part of the original pure python version, it is unittested behavior, and it is a documented restriction). It allows other classes to "see" the NotImplemented and have a chance to take-over using __ror__, __rand__, etc. Also, by not accepting any iterable, it prevents little coding atrocities or possible mistakes like "s | 'abc'". This is a break with what is done for lists (Guido has previously lamented that list.__add__ accepting any iterable is one of his "regrets"). This design has been in place for several years and so far everyone has been happy with it (no bug reports, feature requests, or discussions on the newsgroup, etc). If someone needed to process a non-set iterable, the named set methods (like intersection, update, etc) all accept any iterable value and this provides an immediate, usable alternative. In contrast, the Set and MutableSet abstract base classes in Lib/_abcoll.py take a different approach. They specify that something claiming to be set-like will accept any-iterable for a binary operator (IOW, the builtin set object does not comply). The provided mixins (such as __or__, __and__, etc) are implemented that way and it works fine. Also, the Set and MutableSet API do not provide named methods such as update, intersection, difference, etc. They aren't really needed because the operator methods already provide the functionality and because it keeps the Set API to a reasonable minimum. All of this it well and good, but the two don't interoperate. You can't get an instance of the Set ABC to work with a regular set, nor do regular sets comply with the ABC. These are problems because they defeat some of the design goals for ABCs. We have a few options: 1. Liberalize setobject.c binary operator methods to accept anything registered to the Set ABC and add a backwards incompatible restriction to the Set ABC binary operator methods to only accept Set ABC instances (they currently accept any iterable). This approach has a backwards incompatible tightening of the Set ABC, but that will probably affect very few people. It also has the disadvantage of not providing a straight-forward way to handle general iterable arguments (either the implementer needs to write named binary methods like update, difference, etc for that purpose or the user will need to cast the the iterable to a set before operating on it). The positive side of this option is that keeps the current advantages of the setobject API and its NotImplemented return value. 1a. Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable. 2. We could liberalize builtin set objects to accept any iterable as an "other" argument to a binary set operator. This choice is not entirely backwards compatible because it would break code depending on being able run __ror__, __rand__, etc after a NotImplemented value is returned. That being said, I think it unlikely that such code exists. The real disadvantage is that it replicates the problems with list.__add__ and Guido has said before that he doesn't want to do that again. I was leaning towards #1 or #1a and the guys on IRC thought #2 would be better. Now I'm not sure and would like additional input so I can get this bug closed for 3.2. Any thoughts on the subject would be appreciated.
I'm not clear on what the issues with list.__add__ were, but my first impression is to lean towards #2. What am I missing?
Thanks,
Raymond
P.S. I also encountered a small difficulty in implementing #2 that would still need to be resolved if that option is chosen.
What's the issue, if you don't mind me asking? Geremy Condra
On Sep 29, 2010, at 11:11 PM, geremy condra wrote:
P.S. I also encountered a small difficulty in implementing #2 that would still need to be resolved if that option is chosen.
What's the issue, if you don't mind me asking?
IIRC, just commenting-out the Py_AnySet checks in set_or, set_xor, etc led to a segfault in TestOnlySetsInBinaryOps in Lib/test/test_set.py. There was something subtle going on and I haven't had time to trace though it yet. Somewhere a guard is probably missing. Raymond
On 9/29/2010 11:50 PM, Raymond Hettinger wrote:
I would like to solicit this group's thoughts on how to reconcile the Set abstract base class with the API for built-in set objects (see http://bugs.python.org/issue8743 ). I've been thinking about this issue for a good while and the RightThingToDo(tm) isn't clear.
Here's the situation:
Binary operators for the built-in set object restrict their "other" argument to instances of set, frozenset, or one of their subclasses. Otherwise, they return NotImplemented. This design was intentional (i.e. part of the original pure python version, it is unittested behavior, and it is a documented restriction). It allows other classes to "see" the NotImplemented and have a chance to take-over using __ror__, __rand__, etc. Also, by not accepting any iterable, it prevents little coding atrocities or possible mistakes like "s | 'abc'". This is a break with what is done for lists (Guido has previously lamented that list.__add__ accepting any iterable is one of his "regrets").
I do not understand this. List.__add__ is currently *more* restrictive than set ops in that it will not even accept a 'frozenlist' (aka tuple).
ll + (4,5,6) Traceback (most recent call last): File "
", line 1, in <module> ll + (4,5,6) TypeError: can only concatenate list (not "tuple") to list ll.__add__((5,6,7)) Traceback (most recent call last): File " ", line 1, in <module> ll.__add__((5,6,7)) TypeError: can only concatenate list (not "tuple") to list
Does this violate the Sequence ABC (assuming there is one)? -- Terry Jan Reedy
On Sep 29, 2010, at 11:29 PM, Terry Reedy wrote:
I do not understand this. List.__add__ is currently *more* restrictive than set ops in that it will not even accept a 'frozenlist' (aka tuple).
Sorry, that should have been __iadd__().
s = range(5) s += 'abc' s [0, 1, 2, 3, 4, 'a', 'b', 'c']
Raymond
On Wed, Sep 29, 2010 at 11:29 PM, Terry Reedy
Does this violate the Sequence ABC (assuming there is one)?
There is a Sequence ABC, but it does not define __add__. It only defines the following methods: __contains__, __getitem__, __iter__, __len__, __reversed__, count, and index tuple, range, and str types all register as following the Sequence ABC. list and bytearray types register as following the MutableSequence ABC, which is a subclass of the Sequence ABC. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC http://stutzbachenterprises.com/
I will say something snarky now and (hopefully) something useful tomorrow. When ABCs went in I was +0 because, like annotations, I was told I wouldn't have to care about them. That said; I do actually care about the set interface and what "set-y-ness" means for regular duck typing reasons. What people expect sets to do is what sets-alikes should do. -Jack
On Thu, Sep 30, 2010 at 1:50 PM, Raymond Hettinger
1a. Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable.
2. We could liberalize builtin set objects to accept any iterable as an "other" argument to a binary set operator. This choice is not entirely backwards compatible because it would break code depending on being able run __ror__, __rand__, etc after a NotImplemented value is returned. That being said, I think it unlikely that such code exists. The real disadvantage is that it replicates the problems with list.__add__ and Guido has said before that he doesn't want to do that again.
I was leaning towards #1 or #1a and the guys on IRC thought #2 would be better. Now I'm not sure and would like additional input so I can get this bug closed for 3.2. Any thoughts on the subject would be appreciated. Thanks,
My own inclination would be to go with #1a, *unless* Guido chimes in to say he's OK with having the set operators accept arbitrary iterators. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 09/29/2010 08:50 PM, Raymond Hettinger wrote:
1. Liberalize setobject.c binary operator methods to accept anything registered to the Set ABC and add a backwards incompatible restriction to the Set ABC binary operator methods to only accept Set ABC instances (they currently accept any iterable).
This approach has a backwards incompatible tightening of the Set ABC, but that will probably affect very few people. It also has the disadvantage of not providing a straight-forward way to handle general iterable arguments (either the implementer needs to write named binary methods like update, difference, etc for that purpose or the user will need to cast the the iterable to a set before operating on it). The positive side of this option is that keeps the current advantages of the setobject API and its NotImplemented return value.
1a. Liberalize setobject.c binary operator methods, restrict SetABC methods, and add named methods (like difference, update, etc) that accept any iterable.
I prefer 1 to 1a, but either is acceptable. 1 just forces you to call set() on your iterable before operating on it, which I think helps ("explicit is better than implicit"). In 1a, by "add named methods that accept any iterable", you mean add those to the SetABC? If so, then I'm more strongly for 1 and against 1a. I'm not convinced all classes implementing the Set ABC should have to implement all those methods. /larry/
participants (7)
-
Daniel Stutzbach
-
geremy condra
-
Jack Diederich
-
Larry Hastings
-
Nick Coghlan
-
Raymond Hettinger
-
Terry Reedy