Mailman 3 Augment abc.Set API (support named set methods for dictionary view objects) - Python-ideas

Augment abc.Set API (support named set methods for dictionary view objects)

ava＠yert.pink ava＠yert.pink

June 1, 2020

3:32 a.m.

Currently the dictionary view objects behave exactly like sets when set operations defined on them are used, such as the pipe or ampersand operators:

...

...
...
d.keys() | [1] {1, 2}

dictionary views are even sets by the ABC standard, having support for all the required abstract methods:

...

...
...
isinstance(d.keys(), collections.abc.Set) True

However, the dict views do not have the equivalent named set methods, such as `intersection` or `issuperset`. It seems that the collections.abc.Set API was purposely crafted for returning True when a dict view is given as an argument to isinstance, as c urrently, the `isdisjoint` method is the only named method that the API requires. I propose that the `Set` ABC API should be augmented to contain all of the named methods. This would provide consistency in the collections, and enhance the duck typing capabilities of the `Set` abc.

Attachments:

attachment.htm (text/html — 2.9 KB)

Show replies by date

Raymond Hettinger

June 2020

8:28 a.m.

...

On Jun 1, 2020, at 3:32 AM, ava@yert.pink ava@yert.pink <ava@yert.pink> wrote:

I propose that the `Set` ABC API should be augmented to contain all of the named methods. This would provide consistency in the collections, and enhance the duck typing capabilities of the `Set` abc.

Two thoughts. First, I believe Guido intentionally omitted the named set methods from the ABC — perhaps the reasons are documented in the ABC PEP. Second, most APIs are easily expanded by adding new methods, but ABCs define a minimum for other classes to implement. So if we added new methods, it would likely break code that was only meeting the existing minimum. Raymond

Serhiy Storchaka

9:19 a.m.

01.06.20 18:28, Raymond Hettinger пише:

...

Concur with Raymond. Also I want to add that ducktyping does not relay on ABCs. Actually they are opposite approaches.

Alex Hall

11:36 a.m.

On Mon, Jun 1, 2020 at 5:31 PM Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

...

I just took a look at the PEP: https://www.python.org/dev/peps/pep-3119/#sets All I can see is:

...

1. I don't know why it specifically says Python 2, the same methods exist in Python 3. 2. I assume this reasoning also applies to other named methods like intersection and union which are similar but not mentioned. 3. That's a pretty minimal justification. Is there more somewhere? Is that all you were referring to? 4. Even just aliases are useful: 1. They're arguably sometimes more readable, particularly to readers who don't know the operators. 2. When I want to use a set method on dict views, I have no easy way of remembering whether it's named methods or operators which work, and my first guess is often wrong. This is mildly inconvenient for me, but for less proficient users it may lead them to think that dict views don't implement the set methods at all and that they misunderstood what 'set-like' meant. 3. The builtin set operators don't accept arbitrary iterables on the RHS - only the named methods do. But several of the Set ABC operators *do* accept arbitrary iterables (why this difference?). I just learned this last bit, I don't think I knew it before, so I imagine there are others that don't know it either. This makes it seem like you have to write `my_dict.keys() & set(other)` instead of just `my_dict.keys() & other` which is both shorter and faster.

...

These would be mixin methods, not abstract. Set already implements the various operators based on the abstract methods, it could easily add more mixin methods which would delegate to the operators. Classes which override the operators with more efficient versions would automatically get efficient aliases for free.

Brandt Bucher

11:49 a.m.

...

These would be mixin methods, not abstract. Set already implements the various operators based on the abstract methods, it could easily add more mixin methods which would delegate to the operators. Classes which override the operators with more efficient versions would automatically get efficient aliases for free.

This same idea came up when we added `|` / `|=` to `dict` in 3.8. The reason we couldn't safely add those to `Mapping`/`MutableMapping` is that it could break compatibility for virtual subclasses that have been `register`-ed, and don't actually inherit from the ABC. The problem is the same here.

Alex Hall

2:17 p.m.

On Mon, Jun 1, 2020 at 8:50 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

OK, I wasn't on this list then, and I haven't found much while searching. I found a brief mention of the problem [here]( https://mail.python.org/archives/list/python-ideas@python.org/message/WX7PNK...) without much explanation of why it might be a problem, and a reply saying that it's not. So I may ask some questions that have been resolved already. What do you mean by 'break compatibility'? Registering a virtual subclass doesn't enforce that any methods are implemented, not even the abstract ones. Are virtual subclasses required to implement all the mixin methods? I thought the mixin methods were just a nice conveniece, not a contract.

Christopher Barker

3:06 p.m.

TL; DR: In this particular case, I don't see much backward compatibility, so yes, let's add them. I've been thinking about the concept of adding methods (mixin or virtual) to ABCs in the context of the "views on sequences" conversation recently on this list. In that context, we (or I, anyway :-) ) decided that it's potentially very disruptive to add a method (as opposed to a new protocol, via a dunder) -- because only the dunder "namespace" is reserved. It that case, there was a concrete example: a `.view` method added as a mixin to the Sequence ABC would be an obvious way to support the use case at hand, but it would then conflict with any Sequence-like (either registered with the ABC, or simply duck typed) that had a `.view` method with a different meaning -- and numpy arrays DO have a .view method with a different meaning -- so a lot of code could break. We could come up with a more obscure name, but then it would be less intuitive, and any name *could* conflict with something somewhere. What all that means is that's a very big deal to add a new non-dunder name to an existing ABC, and we will probably rarely, or never do it. BUT: This *may* be a different case -- the Set ABC (and set duck-typing) is probably far less used than for Sequences. ANd while the proposed methods are not part of the Set ABC at this point, they ARE part of the build in set object. And historically at least, people informally duck typed as often, or more often, than they subclassed from or registered with ABCs anyway. All that is to say that there are no set-like objects in the standard library that have these names with a different meaning, and probably VERY few, of any third party set-like classes that have these names with different meanings as well. So it would not likely break much at all if they were added. (and I'm still confused why they aren't there in the firs place, the PEP doesn't seem very clear about it) And even if they are not added to the ABC, they could still be added to the other set-like objects in the standard library -- are there any other than the dict views? -CHB On Mon, Jun 1, 2020 at 2:22 PM Alex Hall <alex.mojaki@gmail.com> wrote:

...

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

7:45 p.m.

Christopher Barker writes:

...

TL; DR: In this particular case, I don't see much backward compatibility, so yes, let's add them.

...

So it would not likely break much at all if they were added. (and I'm still confused why they aren't there in the firs place, the PEP doesn't seem very clear about it)

Why do you want to add them? The point of an ABC is that it is Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

...

BUT: This *may* be a different case -- the Set ABC (and set duck-typing) is probably far less used than for Sequences.

I don't know about the Set ABC, but duck-typing is presumably quite common. {1}.union([2]) uses set duck-typing. Presumably if self is a set, .union (and .intersection) works on any Sized: all you need mathematically is the ability to iterate the argument, and practically you'd like to have len() so you can avoid infloops. The other thing you need mathematically to have a set is the 'in' operation. If mutable, you need an idempotent .add, and .remove. Of course convenience counts. It's not obvious to me whether the convenience of having those methods outweighs the parsimony of only implementing the dunders. In fact, it's not obvious to me whether there's *any* convenience to having those methods. Shouldn't we want to encourage the use of the more concise and less cluttered operators, which also have the advantage that either operand can provide the implementation? OK, sometimes that may not be an advantage. But even if you want to specify which operand will provide the implementation, you can use the explicit dunder. In that case I agree the named operations are more readable, but how often are you going to do that (for that purpose)?

...

And while the proposed methods are not part of the Set ABC at this point, they ARE part of the build in set object. And historically at least, people informally duck typed as often, or more often, than they subclassed from or registered with ABCs anyway.

If you want all of the methods of the built-in set, use it or derive from it, or construct one.

...

And even if they are not added to the ABC, they could still be added to the other set-like objects in the standard library -- are there any other than the dict views?

What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

Christopher Barker

11:43 p.m.

This is not my idea in the first place, and don't have much use for it. mostly, I was pointing out that adding a couple of the set methods to the Set ABC wouldn't be as disruptive as most additions of methods to ABCs. That's it. But to a couple points:

...

Why do you want to add them? The point of an ABC is that it is

...

Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

Well, Python's history makes all this a bit backward -- most (all?) of the ABCs existed in concrete form before there wre abstract versions. So it's a bit weird. Maybe if we'd started with ABCs, there would be no "regular" methods at all -- who knows? But anyway, as I understand it, "abstract" means "not implemented", which is completely separate from "only the defining features" -- ABCs define the API, not the feature set. And the fact is that Python has a somewhat arbitrary mix of operators implemented by dunders, protocols implemented by dunders (like the len() function) and regular old methods -- all of these are in the ABCs. So I don't think there's any clear principle here -- do we want .union() and friends to be part of the standard set API or not? It's simply a choice. Judging from some of the arguments in other threads, I suspect some people think that ALL the regular methods are only there for legacy reasons, which might explain why these aren't in the Set ABC -- but I don't know that there's any real consensus about that.

...

BUT: This *may* be a different case -- the Set ABC (and set

...
duck-typing) is probably far less used than for Sequences.

I don't know about the Set ABC, but duck-typing is presumably quite common. {1}.union([2]) uses set duck-typing.

no it doesn't -- the set.union method takes any iterable -- that's documented: "Note, the non-operator versions of union() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.union>, intersection() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.intersection>, difference() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.difference>, and symmetric_difference() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.symmetric_differ...>, issubset() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.issubset>, and issuperset() <https://docs.python.org/3.8/library/stdtypes.html#frozenset.issuperset> methods will accept any iterable as an argument." no duck-typing here.

...

Presumably if self is a set, .union (and .intersection) works on any Sized:

well, not, it doesn't.

...

all you need mathematically is the ability to iterate the argument,

which is why it works with any iterable.

...

and practically you'd like to have len() so you can avoid infloops.

I wonder if it does, in fact, use __len__ I haven't tested it, but it certainly does work with any iterable that isn't Sized.

...

The other thing you need mathematically to have a set is the 'in' operation. If mutable, you need an idempotent .add, and .remove.

no one is arguing that the ABC doesn't provide the minimum API for a mathematical set here.

...

Of course convenience counts. It's not obvious to me whether the convenience of having those methods outweighs the parsimony of only implementing the dunders.

I had to go look up "parsimony" to make sure -- and I still don't know what you're point is. Maybe that's not the word you meant? In fact, it's not obvious to me whether

...

there's *any* convenience to having those methods. Shouldn't we want to encourage the use of the more concise and less cluttered operators, which also have the advantage that either operand can provide the implementation?

Well: two points: 1) why does the built in set object have them? Maybe it shouldn't, but it does, so there is something to be said fro making other "Sets" in the standard library the same. 2) the advantage, maybe small, and made by the OP, is that the methods work on any iterable, rather than only on sets. Is that a big deal? not huge, but it is nice sometimes.

...

OK, sometimes that may not be an advantage. But even if you want to specify which operand will provide the implementation, you can use the explicit dunder.

no, you can't -- the dunder doesn't support arbitrary iterables: (`it` is an iterable that isn't much else) In [20]: s Out[20]: {1, 2} In [21]: s.union(it) Out[21]: {0, 1, 2, 3, 4} In [22]: s.__or__(it) Out[22]: NotImplemented In [23]: s.__or__(set(it)) Out[23]: {1, 2}

...

And while the proposed methods are not part of the Set ABC at this

...
point, they ARE part of the build in set object. And historically at least, people informally duck typed as often, or more often, than they subclassed from or registered with ABCs anyway.

If you want all of the methods of the built-in set, use it or derive from it, or construct one.

The OP didn't want to make their own, they wanted built in set like objects to behave more like sets (dict views, in particular). But anyway, if we're going to say "derive from it, or construct one." then why have ABCs at all ?!?

...

...
And even if they are not added to the ABC, they could still be added to the other set-like objects in the standard library -- are there any other than the dict views?

What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

tell that to the author of the built in set object -- that decision was made long ago. And if we're playing the zen game -- isn't "derive from it, or construct one." two ways? plus the ABC -- so that's three! Anyway, this is not a big deal to not add -- but it's not a big deal to add, either. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

9:53 a.m.

Christopher Barker writes:

...

But to a couple points:

...
Why do you want to add them? The point of an ABC is that it is Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

Well, Python's history makes all this a bit backward -- most (all?) of the ABCs existed in concrete form before there wre abstract versions.

"History" -- that aspect of human recollection that makes everything a bit backward. ;-)

...

But anyway, as I understand it, "abstract" means "not implemented", which is completely separate from "only the defining features" -- ABCs define the API, not the feature set.

This is also historical, as you point out:

...

And the fact is that Python has a somewhat arbitrary mix of operators implemented by dunders, protocols implemented by dunders (like the len() function) and regular old methods -- all of these are in the ABCs.

So I don't think there's any clear principle here -- do we want .union() and friends to be part of the standard set API or not? It's simply a choice.

But it's not that simple, as you have made plain with your references to history. Such a choice creates a precedent aka principle.

...

Judging from some of the arguments in other threads, I suspect some people think that ALL the regular methods are only there for legacy reasons,

I tend to agree that they're legacy, but I think it's irrelevant to the discussion. You are correct to focus on whether adding them is likely to cause backward compatibility issues, but I don't think an acceptable degree of backward compatibility is a positive reason to implement something. BTW, thank you for clarifying that you aren't (in these posts) advocating this change. I shouldn't assume that lack of a disclaimer means you're advocating rather than arguing a specific point.

...

...
BUT: This *may* be a different case -- the Set ABC (and set

...
duck-typing) is probably far less used than for Sequences.

I don't know about the Set ABC, but duck-typing is presumably quite common. {1}.union([2]) uses set duck-typing.

no it doesn't -- the set.union method takes any iterable -- that's documented:

[list of documentation references omitted]

...

no duck-typing here.

What do you think the type "iterable" is? It's not a concrete type. The Iterable ABC is defined to include all classes that provide an __iter__ method (see Lib/_collections_abc.py). In other words, "iterable" is a named duck type. It is defined by "you can use it as X in 'for y in X'" (or equivalently, applying the builtin iter to it returns an iterator). For the purposes of set's named methods, an "iterable" is "sufficiently set-like" to use set operations on it. That's duck typing.

...

...
all you need mathematically is the ability to iterate the argument,

which is why it works with any iterable.

That's imprecise. That's why it's *possible* to make it work with any iterable. But Python doesn't always implement things that are possible, and sometimes it implements possible things that turn out to be bad ideas, causing great pain in both the use and the fix (eg, Python 2's union of bytes and str).

...

...
The other thing you need mathematically to have a set is the 'in' operation. If mutable, you need an idempotent .add, and .remove.

no one is arguing that the ABC doesn't provide the minimum API for a mathematical set here.

Nor do I claim they did. My point is that the mathematical minimum is far less than what Python provides, and the "practicality" koan applies, so we provided a lot more than the minimum. Do we need to provide near duplicate methods that coerce iterables to set? I don't know, somebody needs to argue the practicalities.

...

...
Of course convenience counts. It's not obvious to me whether the convenience of having those methods outweighs the parsimony of only implementing the dunders.

I had to go look up "parsimony" to make sure -- and I still don't know what you're point is. Maybe that's not the word you meant?

"Parsimony" here is just a formal term for "YAGNI, so don't waste time typing it".

...

...
In fact, it's not obvious to me whether there's *any* convenience to having those methods. Shouldn't we want to encourage the use of the more concise and less cluttered operators, which also have the advantage that either operand can provide the implementation?

Well: two points:

1) why does the built in set object have them? Maybe it shouldn't, but it does, so there is something to be said fro making other "Sets" in the standard library the same.

There may be something to say *for* doing it, but "we did it over there" is more of a "it's possible" than a reason for.

...

2) the advantage, maybe small, and made by the OP, is that the methods work on any iterable, rather than only on sets. Is that a big deal? not huge, but it is nice sometimes.

I would tend to go with "Explicit is better than implicit" unless the coercion of iterable to set is frequently useful. Treating the results as abstract collections, where l = [1, 2, 1] and s = {1}, l.extend(s) and s.union(l) are quite different things. The point is not that I think the use of set.union on arbitrary iterables *is* frequently buggy; it's that without frequent use, there's little experience on which to base a judgment. (I'm not making a claim about frequency. I personally have never used those methods on non-sets, YMMV.)

...

The OP didn't want to make their own, they wanted built in set like objects to behave more like sets (dict views, in particular).

I'm not criticizing the OP. I'm pushing back on the idea, on the usual grounds of "why should all users of Python bear the burden of a slightly larger standard programming environment, and the maintainers the burden of providing and maintaining it?" The status quo wins ties.

...

But anyway, if we're going to say "derive from it, or construct one." then why have ABCs at all ?!?

Nobody is arguing that "derive from it, or construct the duck type" is *always* the answer. The reason for having ABCs is that names are powerful. You can describe a duck type as "all objects that have __iter__ and __next__", but it's also useful to have the name "Iterator" for the collection of all such objects. Also, many ABCs provide useful generic implementations.

...

...
What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

...

tell that to the author of the built in set object -- that decision was made long ago.

Which means that backward compatibility with existing code is important, and therefore those methods weren't removed in Python 3.0, and probably won't be removed. "Although practicality beats purity" overrides any of the other koans in the right situations, and backward compatibility is one of the most important factors. "Practicality" also is the exception that justifies adding new ways to do old behaviors.

...

Anyway, this is not a big deal to not add -- but it's not a big deal to add, either.

An excellent summary of your position, which I now understand. ;-)

Christopher Barker

5:09 p.m.

On Wed, Jun 3, 2020 at 9:53 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
people think that ALL the regular methods are only there for legacy reasons,

I tend to agree that they're legacy, but I think it's irrelevant to the discussion.

I don't think so -- if you think that the ONLY reason for "regular" methods on ABCs is legacy, then the argument is over -- we're not going to add any new ones. I don't think that -- I think there is a place for regular methods in ABCs, but that in most cases, backward compatibility will make it a bad idea to add any more. But not in this case. You are correct to focus on whether adding them is

...

likely to cause backward compatibility issues, but I don't think an acceptable degree of backward compatibility is a positive reason to implement something.

no, of course not -- there has to be a positive reason to do so -- and I think there is. But not enough to push for it beyond this post. If the OP, or anyone else, wants to conitue to make the case, then I'll chime in in a with a +1 here and there, but that's about it.

...

BTW, thank you for clarifying that you aren't (in these posts) advocating this change. I shouldn't assume that lack of a disclaimer means you're advocating rather than arguing a specific point.

well, I am, but pretty mildly ....

...

...
I don't know about the Set ABC, but duck-typing is presumably quite

...

...
...
common. {1}.union([2]) uses set duck-typing.

no it doesn't -- the set.union method takes any iterable -- that's documented:

[list of documentation references omitted]

...
no duck-typing here.

What do you think the type "iterable" is?

Sorry I didn't mean no duck typing, I meant no duck typing of set(). My point is that those methods take any iterable, not any set-like object -- and my point was that I don't think set-like objects are very often duck typed.

...

For the purposes of set's named methods, an "iterable" is "sufficiently set-like" to use set operations on it. That's duck typing.

I'm not sure this is just semantics, but I disagree, iterable is not "sufficiently set-like" -- it is an iterable. those methods do'nt need anything that is in a set that isn't in an iterable. if they required something set like, they would require something in set that was not in all iterables.

...

Do we need to provide near duplicate methods that coerce iterables to set?

I don't think anyone is asking fort that -- the OP talked about the dict _keys and dict_items object, which, yes, are iterables, but they are also Sets: In [8]: dk = d.keys() In [9]: isinstance(dk, Set) Out[9]: True In [16]: di = d.items() In [17]: isinstance(di, Set) Out[17]: True so no one asking for anything that will coerce iterables to Sets, we are asking to make (at least some of) the Set objects in the standard library have the same API as the built in set object. Which seems pretty darn reasonable to me.

...

"Parsimony" here is just a formal term for "YAGNI, so don't waste time typing it".

fair enough -- personally, I don't know that I've ever used the dict view objects as sets (yes, I use `in`, but that works with an container), so no, I don't need it. But here's a (n abstract) use case: def some_fun(a_set, some_args): ... a_set.union(something_iterable) ... So I have a function that expects a set as input -- I could even use a type hint to say that. I do all my testing, and use cases with the built in set() object -- seems perfectly reasonable. But then one of my users passes in a dict_keys object -- and bam! it fails. So then I need to either wrap something_Iterable in set(), and use the | operator, or I need to wrap a_set in the set() constructor. Not a huge deal, but it does seem unnecessary. I guess I don't like the inconsistency between the "prototype" builtin and the ABC -- seems strange to me. And there is nothing in the docs that discourages use of the methods in favor of the operators. In fact, there is a section that describes why the methods are there, and how they can be useful. OK, maybe now I am advocating :-) But again, not worth any more of my time that I have already spent. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

11:46 p.m.

Christopher Barker writes:

...

But again, not worth any more of my time that I have already spent.

Feel free not to read or respond. (I mean that literally, I don't want to take up more of your time just because I responded. I'd be interested in your response if you care to make one. I do think that there are some general principles of software design here, and also some issues of how they should be applied to the Python language.)

...

if they required something set like, they would require something in set that was not in all iterables.

That's the opposite of duck-typing, though. Duck-typing is about *sufficient* conditions for treating a object as an instance of the *argument's* intended type. Whether a object is sufficiently set-like depends on what you're going to do with the object, not on all the irrelevant attributes that actual sets have.

...

...
Do we need to provide near duplicate methods that coerce iterables to set?

...

I don't think anyone is asking fort that

I think I've failed to convey my meaning, because the whole point is that the named methods are the same operations as the dunders except that they also allow general iterables, which is possible by forgetting that iterables have an order (and then reimposing an implementation-dependent order) and forgetting that they may contain duplicate values. It's the loss of the two latter characteristics that I mean by "coerce to set".

...

But here's a (n abstract) use case:

def some_fun(a_set, some_args): ... a_set.union(something_iterable) ...

Of course it's easy to construct hypotheticals which fail if people decide to apply them to views instead of builtin sets. My question is whether the convenience of these functions is genuine, or whether it's an attractive nuisance. My argument here is like the claim the loose behavior of builtin zip with respect to unequal-length iterables is an attractive nuisance. As I see it, the language design questions are: (1) "Does consenting adults apply here?" (2) "Is the convenience that great?" I had a couple more that have to do with issues of implementing the non-dunder methods on all classes that derive from Set, but I'm pretty sure those are solved by providing generic concrete methods in the Set ABC like def union(self, it): return self.__or__(set(it))

...

So then I need to either wrap something_Iterable in set(), and use the | operator,

Which is what I would do: x.union(y) x | set(y) I agree that ".union(y)" may be easier for many people to type (you don't have to do a mental gear change to an operator notation which is different from conventional set theory notation), but as you can see the number of characters is the same. I don't see a huge cost to this as people can get used to the explicit conversion, and for other set operations the operator versions are shorter. The counterargument is that as long as builtin set supports the named versions, people will use them and expect them on all Sets. Now that I realize the Set ABC can provide all the non-dunder methods in reasonably efficient generic implementations, that argument is a lot stronger than I thought. But I'm still enough of a curmudgeon to think it would be a better idea to use explicit conversion and the operator forms. ;-) (Don't argue back, I know that's not going to convince anyone. :-)

...

I guess I don't like the inconsistency between the "prototype" builtin and the ABC -- seems strange to me.

I don't either. I'm just willing to bear it in the face of "Explicit is better than implicit" and "In the face of ambiguity, refuse the temptation to guess" (and for backward compatibility's sake on the builtin set). In this case, is someSet.union(someList) what I really want, or is it someList.extend(someSet)? If you want to argue "look, .extend also takes any Iterable including sets that are arbitrarily ordered, I think they're symmetrical", then I disagree but I'm not going to convince you.[1]

...

And there is nothing in the docs that discourages use of the methods in favor of the operators. In fact, there is a section that describes why the methods are there, and how they can be useful.

Are you referring to Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs'). I don't see anything there about why they're useful, only that if you want to intersect with something not a set (which is mathematically undefined! :-), they're more readable. And, of course "set('abc') & set('cbs')" is *shorter*. The tutorial doesn't mention the non-operator versions at all in the obvious place (section 5.4). Footnotes: [1] FWIW, the former destroys information about order and duplicity, while the latter preserves duplicity while introducing some arbitrariness into the order, so that IMHO the former is dangerous in a way the latter isn't. YMMV.

Christopher Barker

1:43 p.m.

On Sat, Jun 6, 2020 at 11:46 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
But again, not worth any more of my time that I have already spent.

Feel free not to read or respond. (I mean that literally, I don't want to take up more of your time just because I responded.

Well, not worth my time to advocate for a change -- it's always worth my time to kibitz about Python and software design :-)

...

if they required something set like, they would require something

...
in set that was not in all iterables.

That's the opposite of duck-typing, though. Duck-typing is about *sufficient* conditions for treating a object as an instance of the *argument's* intended type. Whether a object is sufficiently set-like depends on what you're going to do with the object, not on all the irrelevant attributes that actual sets have.

of course -- but my point is that the set methods, (.union, etc) do not require a duck-typed Set -- they require a duck typed Iterable. And honestly, I'm not exactly sure what they actually require -- would __contains__ be enough? whereas the operators do actually require a Set object. So, for instance, to bring it back around to the OP's example, you can pass a dict_keys object in :-) Anyway, I don't think we actually disagree about anything here -- just talking about it a bit differently.

...

...
Do we need to provide near duplicate methods that coerce

...
iterables to set?

...
I don't think anyone is asking fort that

I think I've failed to convey my meaning, because the whole point is that the named methods are the same operations as the dunders except that they also allow general iterables, which is possible by forgetting that iterables have an order (and then reimposing an implementation-dependent order) and forgetting that they may contain duplicate values. It's the loss of the two latter characteristics that I mean by "coerce to set".

OK, I see what you mean -- the point is that the only "function" of the methods in question is that they essentially coerce iterables to sets (do they actually do that under the hood? I don't think so, the C code is pretty hard for me to read, but it doesn't look like it), but yeah, functionally they are doing that.

...

Of course it's easy to construct hypotheticals which fail if people decide to apply them to views instead of builtin sets.

well, the point of having an ABC is that folks can use anything that confirms to it when a Set is expected.

...

My question is whether the convenience of these functions is genuine, or whether it's an attractive nuisance.

Sure -- but that applies just as well, maybe moreso, to the built in set() As I see it, the language design questions are:

...

(1) "Does consenting adults apply here?" (2) "Is the convenience that great?"

I had a couple more that have to do with issues of implementing the non-dunder methods on all classes that derive from Set, but I'm pretty sure those are solved by providing generic concrete methods in the Set ABC like

def union(self, it): return self.__or__(set(it))

exactly -- which is my point that it would not be particularly disruptive to add these to the ABC.

...

I agree that ".union(y)" may be easier for many people to type (you don't have to do a mental gear change to an operator notation which is different from conventional set theory notation), but as you can see the number of characters is the same. I don't see a huge cost to this as people can get used to the explicit conversion, and for other set operations the operator versions are shorter.

There is performance - the methods are a touch faster. In [27]: def use_operator(s, i): ...: s |= set(i) ...: In [28]: def use_method(s, i): ...: s.union(i) ...: In [29]: s1 = {random.randint(0,10000) for i in range(10000)} In [30]: s2 = {random.randint(0,10000) for i in range(10000)} In [31]: l1 = [random.randint(0, 10000) for i in range(10000)] In [32]: %timeit use_operator(s1, l1) 453 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [33]: %timeit use_method(s2, l1) 307 µs ± 3.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) granted, that's not a performance difference that's likely to matter, but it is there. (and of course, would not be an advantage for ABC mixins) The counterargument is that as long as builtin set supports the named

...

versions, people will use them and expect them on all Sets.

That is the core point of my argument -- I'm ambivalent about whether the built in set should have had them in the first place, but as it does, I'd rather see a consistent API among builtins.

...

But I'm still enough of a curmudgeon to think it would be a better idea to use explicit conversion and the operator forms. ;-) (Don't argue back, I know that's not going to convince anyone. :-)

Sure -- but I don't see evidence that there is consensus (or even majority opinion) about that in the community.

...

In this case, is someSet.union(someList) what I really

...

want, or is it someList.extend(someSet)?

that already works, yes? but does not give the same results (as you know). I don't get your point here: whoever is writing the code sure as heck should know whether they want a List or a Set when they are done "joining" the two objects. Though if there is potential confusion, it's a good thing that they don't use the same methods (or operators :-) ) Finally, i haven't used sets much in production code and when I have it was usually a quick find "filter out the duplicates" use case. But when I have used them more extensively, I do find that I am most often adding stuff to them from iterables, and not sets. Probably because it's easier to expect folks to pass a list in with what they want than requiring a set object. So I like having methods that can take arbitrary iterables.

...

And there is nothing in the docs that discourages use of the

...
methods in favor of the operators. In fact, there is a section that describes why the methods are there, and how they can be useful.

Are you referring to

Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs').

I don't see anything there about why they're useful, only that if you want to intersect with something not a set (which is mathematically undefined! :-), they're more readable. And, of course "set('abc') & set('cbs')" is *shorter*.

The key point is that the docs don't discourage it -- at all.

...

The tutorial doesn't mention the non-operator versions at all in the obvious place (section 5.4).

interesting -- that does make your case that at least the person that wrote the tutorial thought the operators are the canonical way to use sets :-) In short: I like the methods, both because I like methods, they are more mnemonic for me, and because working with arbitrary iterables is a common use case for me. But I do see the appeal of having only one way to do it :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Raymond Hettinger

June 2020

8:28 a.m.

...

On Jun 1, 2020, at 3:32 AM, ava@yert.pink ava@yert.pink <ava@yert.pink> wrote:

I propose that the `Set` ABC API should be augmented to contain all of the named methods. This would provide consistency in the collections, and enhance the duck typing capabilities of the `Set` abc.

Serhiy Storchaka

9:19 a.m.

01.06.20 18:28, Raymond Hettinger пише:

...

Concur with Raymond. Also I want to add that ducktyping does not relay on ABCs. Actually they are opposite approaches.

Alex Hall

11:36 a.m.

On Mon, Jun 1, 2020 at 5:31 PM Raymond Hettinger < raymond.hettinger@gmail.com> wrote:

...

I just took a look at the PEP: https://www.python.org/dev/peps/pep-3119/#sets All I can see is:

...

Brandt Bucher

11:49 a.m.

...

These would be mixin methods, not abstract. Set already implements the various operators based on the abstract methods, it could easily add more mixin methods which would delegate to the operators. Classes which override the operators with more efficient versions would automatically get efficient aliases for free.

Alex Hall

2:17 p.m.

On Mon, Jun 1, 2020 at 8:50 PM Brandt Bucher <brandtbucher@gmail.com> wrote:

...

Christopher Barker

3:06 p.m.

...

-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Stephen J. Turnbull

June 2020

7:45 p.m.

Christopher Barker writes:

...

TL; DR: In this particular case, I don't see much backward compatibility, so yes, let's add them.

...

So it would not likely break much at all if they were added. (and I'm still confused why they aren't there in the firs place, the PEP doesn't seem very clear about it)

Why do you want to add them? The point of an ABC is that it is Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

...

BUT: This *may* be a different case -- the Set ABC (and set duck-typing) is probably far less used than for Sequences.

...

And while the proposed methods are not part of the Set ABC at this point, they ARE part of the build in set object. And historically at least, people informally duck typed as often, or more often, than they subclassed from or registered with ABCs anyway.

If you want all of the methods of the built-in set, use it or derive from it, or construct one.

...

And even if they are not added to the ABC, they could still be added to the other set-like objects in the standard library -- are there any other than the dict views?

What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

Christopher Barker

11:43 p.m.

...

Why do you want to add them? The point of an ABC is that it is

...

Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

...

BUT: This *may* be a different case -- the Set ABC (and set

...
duck-typing) is probably far less used than for Sequences.

I don't know about the Set ABC, but duck-typing is presumably quite common. {1}.union([2]) uses set duck-typing.

...

Presumably if self is a set, .union (and .intersection) works on any Sized:

well, not, it doesn't.

...

all you need mathematically is the ability to iterate the argument,

which is why it works with any iterable.

...

and practically you'd like to have len() so you can avoid infloops.

I wonder if it does, in fact, use __len__ I haven't tested it, but it certainly does work with any iterable that isn't Sized.

...

The other thing you need mathematically to have a set is the 'in' operation. If mutable, you need an idempotent .add, and .remove.

no one is arguing that the ABC doesn't provide the minimum API for a mathematical set here.

...

Of course convenience counts. It's not obvious to me whether the convenience of having those methods outweighs the parsimony of only implementing the dunders.

I had to go look up "parsimony" to make sure -- and I still don't know what you're point is. Maybe that's not the word you meant? In fact, it's not obvious to me whether

...

there's *any* convenience to having those methods. Shouldn't we want to encourage the use of the more concise and less cluttered operators, which also have the advantage that either operand can provide the implementation?

...

OK, sometimes that may not be an advantage. But even if you want to specify which operand will provide the implementation, you can use the explicit dunder.

...

And while the proposed methods are not part of the Set ABC at this

...
point, they ARE part of the build in set object. And historically at least, people informally duck typed as often, or more often, than they subclassed from or registered with ABCs anyway.

If you want all of the methods of the built-in set, use it or derive from it, or construct one.

...

...
And even if they are not added to the ABC, they could still be added to the other set-like objects in the standard library -- are there any other than the dict views?

What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

Stephen J. Turnbull

9:53 a.m.

Christopher Barker writes:

...

But to a couple points:

...
Why do you want to add them? The point of an ABC is that it is Abstract. This doesn't just mean "can't instantiate", it also means "only the defining features."

Well, Python's history makes all this a bit backward -- most (all?) of the ABCs existed in concrete form before there wre abstract versions.

"History" -- that aspect of human recollection that makes everything a bit backward. ;-)

...

But anyway, as I understand it, "abstract" means "not implemented", which is completely separate from "only the defining features" -- ABCs define the API, not the feature set.

This is also historical, as you point out:

...

And the fact is that Python has a somewhat arbitrary mix of operators implemented by dunders, protocols implemented by dunders (like the len() function) and regular old methods -- all of these are in the ABCs.

So I don't think there's any clear principle here -- do we want .union() and friends to be part of the standard set API or not? It's simply a choice.

But it's not that simple, as you have made plain with your references to history. Such a choice creates a precedent aka principle.

...

Judging from some of the arguments in other threads, I suspect some people think that ALL the regular methods are only there for legacy reasons,

...

...
BUT: This *may* be a different case -- the Set ABC (and set

...
duck-typing) is probably far less used than for Sequences.

I don't know about the Set ABC, but duck-typing is presumably quite common. {1}.union([2]) uses set duck-typing.

no it doesn't -- the set.union method takes any iterable -- that's documented:

[list of documentation references omitted]

...

no duck-typing here.

...

...
all you need mathematically is the ability to iterate the argument,

which is why it works with any iterable.

...

...
The other thing you need mathematically to have a set is the 'in' operation. If mutable, you need an idempotent .add, and .remove.

no one is arguing that the ABC doesn't provide the minimum API for a mathematical set here.

...

...
Of course convenience counts. It's not obvious to me whether the convenience of having those methods outweighs the parsimony of only implementing the dunders.

I had to go look up "parsimony" to make sure -- and I still don't know what you're point is. Maybe that's not the word you meant?

"Parsimony" here is just a formal term for "YAGNI, so don't waste time typing it".

...

...
In fact, it's not obvious to me whether there's *any* convenience to having those methods. Shouldn't we want to encourage the use of the more concise and less cluttered operators, which also have the advantage that either operand can provide the implementation?

Well: two points:

1) why does the built in set object have them? Maybe it shouldn't, but it does, so there is something to be said fro making other "Sets" in the standard library the same.

There may be something to say *for* doing it, but "we did it over there" is more of a "it's possible" than a reason for.

...

2) the advantage, maybe small, and made by the OP, is that the methods work on any iterable, rather than only on sets. Is that a big deal? not huge, but it is nice sometimes.

...

The OP didn't want to make their own, they wanted built in set like objects to behave more like sets (dict views, in particular).

...

But anyway, if we're going to say "derive from it, or construct one." then why have ABCs at all ?!?

...

...
What's the point? You want to add "There should be two-- and preferably exactly two --obvious ways to do it" to the Zen?

...

tell that to the author of the built in set object -- that decision was made long ago.

...

Anyway, this is not a big deal to not add -- but it's not a big deal to add, either.

An excellent summary of your position, which I now understand. ;-)

Christopher Barker

5:09 p.m.

On Wed, Jun 3, 2020 at 9:53 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
people think that ALL the regular methods are only there for legacy reasons,

I tend to agree that they're legacy, but I think it's irrelevant to the discussion.

...

likely to cause backward compatibility issues, but I don't think an acceptable degree of backward compatibility is a positive reason to implement something.

...

BTW, thank you for clarifying that you aren't (in these posts) advocating this change. I shouldn't assume that lack of a disclaimer means you're advocating rather than arguing a specific point.

well, I am, but pretty mildly ....

...

...
I don't know about the Set ABC, but duck-typing is presumably quite

...

...
...
common. {1}.union([2]) uses set duck-typing.

no it doesn't -- the set.union method takes any iterable -- that's documented:

[list of documentation references omitted]

...
no duck-typing here.

What do you think the type "iterable" is?

...

For the purposes of set's named methods, an "iterable" is "sufficiently set-like" to use set operations on it. That's duck typing.

...

Do we need to provide near duplicate methods that coerce iterables to set?

...

"Parsimony" here is just a formal term for "YAGNI, so don't waste time typing it".

Stephen J. Turnbull

11:46 p.m.

Christopher Barker writes:

...

But again, not worth any more of my time that I have already spent.

...

if they required something set like, they would require something in set that was not in all iterables.

...

...
Do we need to provide near duplicate methods that coerce iterables to set?

...

I don't think anyone is asking fort that

...

But here's a (n abstract) use case:

def some_fun(a_set, some_args): ... a_set.union(something_iterable) ...

...

So then I need to either wrap something_Iterable in set(), and use the | operator,

...

I guess I don't like the inconsistency between the "prototype" builtin and the ABC -- seems strange to me.

...

And there is nothing in the docs that discourages use of the methods in favor of the operators. In fact, there is a section that describes why the methods are there, and how they can be useful.

Christopher Barker

1:43 p.m.

On Sat, Jun 6, 2020 at 11:46 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

...
But again, not worth any more of my time that I have already spent.

Feel free not to read or respond. (I mean that literally, I don't want to take up more of your time just because I responded.

Well, not worth my time to advocate for a change -- it's always worth my time to kibitz about Python and software design :-)

...

if they required something set like, they would require something

...
in set that was not in all iterables.

That's the opposite of duck-typing, though. Duck-typing is about *sufficient* conditions for treating a object as an instance of the *argument's* intended type. Whether a object is sufficiently set-like depends on what you're going to do with the object, not on all the irrelevant attributes that actual sets have.

...

...
Do we need to provide near duplicate methods that coerce

...
iterables to set?

...
I don't think anyone is asking fort that

I think I've failed to convey my meaning, because the whole point is that the named methods are the same operations as the dunders except that they also allow general iterables, which is possible by forgetting that iterables have an order (and then reimposing an implementation-dependent order) and forgetting that they may contain duplicate values. It's the loss of the two latter characteristics that I mean by "coerce to set".

...

Of course it's easy to construct hypotheticals which fail if people decide to apply them to views instead of builtin sets.

well, the point of having an ABC is that folks can use anything that confirms to it when a Set is expected.

...

My question is whether the convenience of these functions is genuine, or whether it's an attractive nuisance.

Sure -- but that applies just as well, maybe moreso, to the built in set() As I see it, the language design questions are:

...

(1) "Does consenting adults apply here?" (2) "Is the convenience that great?"

I had a couple more that have to do with issues of implementing the non-dunder methods on all classes that derive from Set, but I'm pretty sure those are solved by providing generic concrete methods in the Set ABC like

def union(self, it): return self.__or__(set(it))

exactly -- which is my point that it would not be particularly disruptive to add these to the ABC.

...

I agree that ".union(y)" may be easier for many people to type (you don't have to do a mental gear change to an operator notation which is different from conventional set theory notation), but as you can see the number of characters is the same. I don't see a huge cost to this as people can get used to the explicit conversion, and for other set operations the operator versions are shorter.

...

versions, people will use them and expect them on all Sets.

That is the core point of my argument -- I'm ambivalent about whether the built in set should have had them in the first place, but as it does, I'd rather see a consistent API among builtins.

...

But I'm still enough of a curmudgeon to think it would be a better idea to use explicit conversion and the operator forms. ;-) (Don't argue back, I know that's not going to convince anyone. :-)

Sure -- but I don't see evidence that there is consensus (or even majority opinion) about that in the community.

...

In this case, is someSet.union(someList) what I really

...

want, or is it someList.extend(someSet)?

...

And there is nothing in the docs that discourages use of the

...
methods in favor of the operators. In fact, there is a section that describes why the methods are there, and how they can be useful.

Are you referring to

Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs').

I don't see anything there about why they're useful, only that if you want to intersect with something not a set (which is mathematically undefined! :-), they're more readable. And, of course "set('abc') & set('cbs')" is *shorter*.

The key point is that the docs don't discourage it -- at all.

...

The tutorial doesn't mention the non-operator versions at all in the obvious place (section 5.4).

1746

Age (days ago)

1752

Last active (days ago)

List overview

Download

12 comments

7 participants

participants (7)

Alex Hall
ava＠yert.pink ava＠yert.pink
Brandt Bucher
Christopher Barker
Raymond Hettinger
Serhiy Storchaka
Stephen J. Turnbull

Augment abc.Set API (support named set methods for dictionary view objects)

tags

participants (7)