
On Fri, Oct 18, 2019 at 01:32:55PM -0700, Ethan Furman wrote:
On 10/18/2019 10:25 AM, Steven D'Aprano wrote:
On Fri, Oct 18, 2019 at 09:17:54AM -0700, Ethan Furman wrote:
That result just doesn't match up with the '+' operator.
Why not?
Pretty sure I answered that question in my OP. Here it is again since you conveniently stripped it out:
I stripped it out because it doesn't answer the question. Restating it word for word doesn't help. Given the large number of meanings that we give the + symbol, what is it specifically about merging two dicts (mappings) that doesn't match the plus symbol? Which of the many uses of plus are you referring to? Just stating, as you did, that + doesn't match dict merging is begging the question: it doesn't match because we haven't defined the + operator to mean merge. If we define the + operator to mean merge, then it will match, by definition. For example, in Groovy and Kotlin, we can say that dict merging matches + because that's the symbol they already use for dict merging. Likewise, we can say the same thing about Python: Counter already uses + to merge Counters. If you want to argue against plus, you need a better argument than "it doesn't match" since plus has many different uses and it's not even clear what "doesn't match" means. Of course there are differences between dict merging and numeric addition, just as there are differences between numeric addition and concatenation. But there are also similarities, which is why so many people immediately think of merging two dicts as an obvious kind of addition, just as they think of concatenating two strings/lists as a kind of addition. [...]
Before answering, please check the PEP to see if your objection has already been raised.
What do you know, it had been!
Gosh, anyone would think that I had spent many, many hours crawling through Python-Ideas threads gathering arguments for and against the proposal before writing the PEP. *wink*
PEP 584: ------- Dict addition is lossy Dict addition can lose data (values may disappear); no other form of addition is lossy.
Response:
It isn't clear why the first part of this argument is a problem. dict.update() may throw away values, but not keys; that is expected behavior, and will remain expected behavior regardless of whether it is spelled as update() or +.
It's a problem because the method is named "update", not "add", "join", or "combine".
Sorry, I still don't see why this is a *problem*. What badness do you see happening from it? In the context of dicts, "update" is an obvious synonym for "add, join, combine, merge" etc. If you update a set of records, you add the new records to the old records, with the new records taking priority. Depending on the implementation, an update might even be a pure concatenation, with the old records still there but inaccessible.
The dict "+" operator is being turned into a deduplication operator - as if, for example:
--> [1, 2, 3] + [3, 4, 5] [1, 2, 3, 4, 5] # not what happens
Right, because list addition is concatentation, not set union. Just like this is not a problem: 1234 + 5678 --> 12345678 # not what happens Numeric addition is not concatenation (except in unary (base 1) number systems). Nor is it set union.
The second definition of "add" in Webster's dictionary is "to join or unite", which matches dict merging very well.
No, it doesn't -- not in the case of duplicate keys. If we had two people named "Steve" and joined them into a group, would you expect one of them to just disappear?
Not people, no, since people are compared by identity, not personal name. But we're not talking about adding *people*. If you add the bitsets 0b1001 and 0b101, the least significant bit just disappears, giving 0b1110. Did you expect that bits (or decimal digits) are conserved by addition? Of course you don't. So why do you expect dict addition to conserve values? (By the way, dict addition will conserve keys.)
Even better, if we had two engineers (key) named Anita and Carolyn (values) and combined them into a group, do you expect one of them to vanish?
You're obviously very privileged to have never experienced a merger between two companies or two departments where a whole lot of people are made redundant due to their positions now being duplicated. If you use the position "engineer" as key, you are *requiring* that there is only a single engineer at a time. If you want two engineers, you need to use some other key. [...]
As I'm sure you are aware, natural language is inherently ambiguous. Python is not a natural language. It is my contention that the operation of combining two data structures together that results in data loss should not be called "+". As it happens, the Python set() agrees:
Set union isn't lossy. It never throws away an element. It might not be the same *object* as the original, but it will still be equal. The same occurs with dict merging: no key will be thrown away. I wasn't paying attention back in 2.3 or so when sets were introduced, but I expect that the main driving force for using & and | for set intersection and union over * and + was familiarity with bitwise intersection and union from C. In mathematics, the most common notation for sets is ∪ and ∩ but you do occassionally find people using + and ⋅ (the dot operator) due to the close connection between set operations and Boolean algebra operations, where union and intersection are frequently spelled as "addition" and "multiplication". Sometimes they use + to mean disjoint union, which is even further away from the naive "addition as plussing numbers" than regular old union. But as far as it goes, using | instead of + for dicts is a viable choice, especially if you want to argue for dicts to offer the full set API of intersection, difference and symmetric difference as well. -- Steven