On Sat, Jun 6, 2020 at 11:46 PM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

> But again, not worth any more of my time that I have already spent.

Feel free not to read or respond. (I mean that literally, I don't
want to take up more of your time just because I responded.

Well, not worth my time to advocate for a change -- it's always worth my time to kibitz about Python and software design :-)

> if they required something set like, they would require something
> in set that was not in all iterables.

That's the opposite of duck-typing, though. Duck-typing is about
*sufficient* conditions for treating a object as an instance of the
*argument's* intended type. Whether a object is sufficiently set-like
depends on what you're going to do with the object, not on all the
irrelevant attributes that actual sets have.

of course -- but my point is that the set methods, (.union, etc) do not require a duck-typed Set -- they require a duck typed Iterable. And honestly, I'm not exactly sure what they actually require -- would __contains__ be enough?

whereas the operators do actually require a Set object.

So, for instance, to bring it back around to the OP's example, you can pass a dict_keys object in :-)

Anyway, I don't think we actually disagree about anything here -- just talking about it a bit differently.

> > Do we need to provide near duplicate methods that coerce
> > iterables to set?

> I don't think anyone is asking fort that

I think I've failed to convey my meaning, because the whole point is
that the named methods are the same operations as the dunders except
that they also allow general iterables, which is possible by
forgetting that iterables have an order (and then reimposing an
implementation-dependent order) and forgetting that they may contain
duplicate values. It's the loss of the two latter characteristics
that I mean by "coerce to set".

OK, I see what you mean -- the point is that the only "function" of the methods in question is that they essentially coerce iterables to sets (do they actually do that under the hood? I don't think so, the C code is pretty hard for me to read, but it doesn't look like it), but yeah, functionally they are doing that.

Of course it's easy to construct hypotheticals which fail if people
decide to apply them to views instead of builtin sets.

well, the point of having an ABC is that folks can use anything that confirms to it when a Set is expected.

My question is
whether the convenience of these functions is genuine, or whether it's
an attractive nuisance.

Sure -- but that applies just as well, maybe moreso, to the built in set()

As I see it, the language design questions are:

(1) "Does consenting adults apply here?"
(2) "Is the convenience that great?"

I had a couple more that have to do with issues of implementing the
non-dunder methods on all classes that derive from Set, but I'm pretty
sure those are solved by providing generic concrete methods in the Set
ABC like

def union(self, it):
return self.__or__(set(it))

exactly -- which is my point that it would not be particularly disruptive to add these to the ABC.

I agree that ".union(y)" may be easier for many people to type (you
don't have to do a mental gear change to an operator notation which is
different from conventional set theory notation), but as you can see
the number of characters is the same. I don't see a huge cost to this
as people can get used to the explicit conversion, and for other set
operations the operator versions are shorter.

There is performance - the methods are a touch faster.

In [27]: def use_operator(s, i):
...: s |= set(i)
...:

In [28]: def use_method(s, i):
...: s.union(i)
...:

In [29]: s1 = {random.randint(0,10000) for i in range(10000)}

In [30]: s2 = {random.randint(0,10000) for i in range(10000)}

In [31]: l1 = [random.randint(0, 10000) for i in range(10000)]

In [32]: %timeit use_operator(s1, l1)
453 µs ± 2.94 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [33]: %timeit use_method(s2, l1)
307 µs ± 3.64 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

granted, that's not a performance difference that's likely to matter, but it is there. (and of course, would not be an advantage for ABC mixins)

The counterargument is that as long as builtin set supports the named
versions, people will use them and expect them on all Sets.

That is the core point of my argument -- I'm ambivalent about whether the built in set should have had them in the first place, but as it does, I'd rather see a consistent API among builtins.

But I'm still enough of a curmudgeon to
think it would be a better idea to use explicit conversion and the
operator forms. ;-) (Don't argue back, I know that's not going to
convince anyone. :-)

Sure -- but I don't see evidence that there is consensus (or even majority opinion) about that in the community.

> In this case, is someSet.union(someList) what I really

want, or is it someList.extend(someSet)?

that already works, yes? but does not give the same results (as you know). I don't get your point here: whoever is writing the code sure as heck should know whether they want a List or a Set when they are done "joining" the two objects. Though if there is potential confusion, it's a good thing that they don't use the same methods (or operators :-) )

Finally, i haven't used sets much in production code and when I have it was usually a quick find "filter out the duplicates" use case. But when I have used them more extensively, I do find that I am most often adding stuff to them from iterables, and not sets. Probably because it's easier to expect folks to pass a list in with what they want than requiring a set object.

So I like having methods that can take arbitrary iterables.

> And there is nothing in the docs that discourages use of the
> methods in favor of the operators. In fact, there is a section that
> describes why the methods are there, and how they can be useful.

Are you referring to

Note, the non-operator versions of union(), intersection(),
difference(), and symmetric_difference(), issubset(), and
issuperset() methods will accept any iterable as an argument. In
contrast, their operator based counterparts require their
arguments to be sets. This precludes error-prone constructions
like set('abc') & 'cbs' in favor of the more readable
set('abc').intersection('cbs').

I don't see anything there about why they're useful, only that if you
want to intersect with something not a set (which is mathematically
undefined! :-), they're more readable. And, of course "set('abc') &
set('cbs')" is *shorter*.

The key point is that the docs don't discourage it -- at all.

> The tutorial doesn't mention the non-operator versions at all in the obvious place (section 5.4).

interesting -- that does make your case that at least the person that wrote the tutorial thought the operators are the canonical way to use sets :-)

In short: I like the methods, both because I like methods, they are more mnemonic for me, and because working with arbitrary iterables is a common use case for me. But I do see the appeal of having only one way to do it :-)

-CHB

Christopher Barker, PhD

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython