If the `set.add` feature can return a flag stating whether or not the element was previously present or not cheaply and atomically, without slowing down the single-threaded case with locks, then that's great (as I already stated...) and this entire subthread about threading is irrelevant and you can stop reading now.
If not, then we should consider carefully the consequences of adding a feature that will either slow down the common single-threaded case, or potentially be an attractive nuisance in the multi-threaded case.
On Sat, Jun 27, 2020 at 06:46:47PM +1000, Chris Angelico wrote:
I grew up with concurrency being a perfectly normal thing. To me, EVERYTHING is concurrent. (Back then, it was single-core CPUs, so technically most machine code instructions could be assumed to be atomic, but at any higher level, you could be preempted at any moment.)
Cool, but it ignores the point that when you access the file system, you cannot avoid concurrency because that's what the OS gives you whether you want it or not. That's not what happens with adding elements to a set, unless you are sharing it between threads.
(For the record, I too grew up with concurrency being a perfectly normal thing. Every desktop OS I've ever touched has had some form of multi- processing or another. Classic MacOS had cooperative multiprocessing and desk accessories, DOS had TSRs, and the first computer I used for serious coding was Unix.)
Unless you are in the [Quadrant of Hell](https://imgur.com/a/jjA52vI) you don't even need to think about this for sets. There are no Time Of Check to Time Of Use issues worth thinking about in the given example, and frankly Chris I doubt that you surround ever set access with locks in single-threaded code because TOCTOU is "stronger" (more important?) than writing clear and simple code. And if you do, you're probably wondering why Python is so slow *wink*
And that is a fundamental difference between our philosophies. You consider threads to be terrifyingly scary,
Oh I do, sometimes I just start screaming at the very thought. I've even been known to wet myself.
and shared mutable data to be "hell". I consider this to be the normal state of things, and that whenever you care, you take care. It actually doesn't come up all that often.
I'm sure you're very good at what you do, but deadlocks, resource starvation, non-deterministic execution order, race conditions, etc are genuine problems for people, and concurrent code is much harder to write correctly than sequential code, and even more difficult to debug. Stories like these are hardly rare:
Other people have suggested that concurrent code is often riddled with race conditions:
"Because concurrency is so incredibly difficult, many code bases I have worked on have these sorts of bugs….everywhere"
and the consequences of race conditions have sometimes been catastrophic:
which is why people much smarter than me have spent decades designing languages and execution models specifically to deal with concurrency and parallel computing safely:
or simply recommending that, when possible, we ought to avoid mutable shared state (in the same way we avoid GOTOs and global variables and other anti-patterns):
I'm not sure why this is so controversial. I'm making a moderate claim here, threading is much harder to get right than sequential code.
But I'll grant you that I wasn't thinking about threaded code. The example given was single-threaded, but maybe a better example would involve threading. But that supports my point: thinking concurrently doesn't come naturally.
Nor does thinking algebraically, nor thinking like a database, nor thinking in terms of diamond inheritance and an MRO. These things have to be learned. Is it a problem to have to learn things?
Maybe. It depends on the things you have to learn and why you need to. Personally I don't feel that learning category theory and monads just to be able to print "Hello World" would be a good use of my time.
Please let's not over-generalise here, I'm not opposed to learning in general, I'm pointing out that writing concurrent-style code for something which is trivially thought of as "if the element has not been seen, add it" is not the most natural way of thinking about the task.
So `add` would have to be atomic at the C level. If it is already atomic, great, that's one objection neutralised. But if not, then will making it atomic have a performance impact?
Question: If it ISN'T atomic at the C level, what happens?
Sometimes it will return False when it should have returned True, or True when it should have returned False, and because of that wrong answer, people will have mysterious bugs, crashes, failures, and code that silently does the wrong thing.
I would expect that set.add() is guaranteed to not add a duplicate, and that right there basically guarantees everything that I need.
Don't you also need the method to guarantee to return True only if the element was already in the set, and False only if the element was not in the set? That is the point of the proposed feature.
(Or maybe it was the other way around, I forget which way the flag is supposed to go.)
If you don't have that guarantee, then you either can't use this feature, or you have to use manual locks, in which case the benefit is significantly reduced.
This idiom works with sets, it works with lists, it works with trees, it works with anything that supports membership testing and inserting into a collection. It is the natural way to think about the task.
So would "ensure this is here, and let me know if you created it".
Are you proposing to add this to lists, deques, dicts, Mappings, Sequences and all other collections? I think you'll need a PEP :-)
No, I'm not. I'm only talking about this as a concept in things that reject duplicates. So at best, it would be sets and dicts. Why on earth would you assume I would want this operation on a list?
I commented that the "check, then insert" idiom works with lists etc as well as sets, but the "insert and return a flag" idiom doesn't work with lists etc. You countered that by saying that the "insert and return flag" would work equally well. It can't work at all if it doesn't exist. So for it to work just as well for lists as it works for sets, it has to be added to lists.
Are you THAT desperate for an argument?
Is that what we're having? I thought we were discussing the pros and cons of a proposed feature. That's what the purpose of Python-Ideas is, or at least was.
That's the second time you've made that comment. It would be more considerate, open and empathic to assume I'm having a good faith discussion instead of accusing me of trolling.
They have different semantics, and I can imagine cases where both would be useful. Although I suppose option 1 is moot if sets don't over-write the existing element.
They don't have different semantics, so the distinction is irrelevant.
In one the add method is unconditionally called. In the other, the add method is only sometimes called. The difference in semantics is clear if you consider that this will affect not only builtin sets, but also subclasses which may have side-effects or perform other computations.