On Friday, June 26, 2020, at 23:25 -0500, Steven D'Aprano wrote:
On Fri, Jun 26, 2020 at 06:16:05AM -0500, Dan Sommers wrote:
already_there = seen.add(element) if already_there: # handle the duplicate case
Who thinks like that? *wink*
Anyone who practices EAFP rather than LBYL? Or is that why you're winking?
That doesn't come naturally, and in single-threaded code it's also unnecessary.
Not unlike ChrisA, I grew up in a multiprocessor, multithreaded, asynchronous world, and I don't always assume that single-threaded code will stay that way.
By the way, since there's no try...except involved and no need to catch an exception, it's not EAFP.
Perhaps by your standard. The code I wrote performs an operation, and then asks whether or not some condition was met, as opposed to asking whether the condition is met first, and then conditionally performing the operation.
But either way, you also have to decide whether the `add` (or the new method) should *unconditionally* insert the element, or only do so if it wasn't present. This makes a big difference:
seen = {2} already_there = seen.add(2.0)
At this point, is `seen` the set {2} or {2.0}? Justify why one answer is the best answer.
The actual best answer is left as an exercise for the interested reader, but whatever it is, it's justified by backwards compatibility, the existing definition of "present," and the principle of least surprise:
The documentation doesn't guarantee one behaviour or another:
https://docs.python.org/3/library/stdtypes.html#frozenset.add
Not explicitly, no, but the definition of a set (from that same section of documentation) is "an unordered collection of distinct hashable objects." Is 2 distinct from 2.0? Not according to Python: Python 3.8.3 (default, May 17 2020, 18:15:42) [GCC 10.1.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> 2 == 2.0 True
Python 3.8.3 (default, May 17 2020, 18:15:42) [GCC 10.1.0] on linux Type "help", "copyright", "credits" or "license" for more information.
seen = {2} 2.0 in seen True seen.add(2.0) seen {2}
Well I'm completely surprised, because I expected `add` to, you know, actually add the element replacing the one that was already there!
But 2 == 2.0, and I asked first, and the answer was that 2.0 was already in the set. (Unsurprisingly, I get the same results for 2+0i and 2.0+0.0j.) So how would you know whether or not set.add added the [not-new] element or just left the set as is? Do you write code that cares?
Seriously, I genuinely thought that the existing behaviour was the opposite and that `add` unconditionally added the element. "Last seen wins". If I was designing sets, that's probably how I would design it. After all, it's called *add*, not *add if not already there*. I was so sure that this was the current behaviour that I didn't bother to check it before posting, which is rare for me.
So I think this counts as the principle of maximal surprise :-)
Then there's a bug in the documentation. Perhaps the word "distinct" or the description of set.add is insufficient. Relatedly, would your design include both remove and discard?
Should the flag be "element was already present and nothing was added" or "element was not there, so something was added"?
Given that the name of the method in question is "add," IMO the flag would indicate that the element was added. Please don't read too much into my answer(s), though, I usually write code more like this: # process unique elements of iterable for element in set(iterable): process_element(element) and I don't care about handling duplicates one at a time, so I don't care which of multiple non-distinct elements end up in my set.