On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to
dict.update() which returns a new dict. dict.update has existed since
Python 1.5 (something like a quarter of a century!) and never grown a
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a
unique keys version is useful, I don't expect it will be useful often.

I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):

1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays)
2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))

dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.

> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.

One of the reasons for preferring + is that it is an obvious way to do
something very common, while {**d1, **d2} is as far from obvious as you
can get without becoming APL or Perl :-)

From the moment PEP 448 published, I've been using unpacking as a more composable/efficient form of concatenation, merging, etc. I'm sorry you don't find it obvious, but a couple e-mails back you said:

"The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first."

Learning to use the unpacking syntax in the case of function calls is necessary for tons of stuff (writing general function decorators, handling initialization in class hierarchies, etc.), and as PEP 448 is titled, this is just a generalization combining the features of unpacking arguments with collection literals.

> The second syntax makes it clear that a new dictionary is being
> constructed and that d2 overrides keys from d1.

Only because you have learned the rule that {**d, **e) means to
construct a new dict by merging, with the rule that in the event of
duplicate keys, the last key seen wins. If you hadn't learned that rule,
there is nothing in the syntax which would tell you the behaviour. We
could have chosen any rule we liked:

No, because we learned the general rule for dict literals that {'a': 1, 'a': 2} produces {'a': 2}; the unpacking generalizations were very good about adhering to the existing rules, so it was basically zero learning curve if you already knew dict literal rules and less general unpacking rules. The only part to "learn" is that when there is a conflict between dict literal rules and function call rules, dict literal rules win.

To be clear: I'm not supporting + as raising error on non-unique keys. Even if it makes dict + dict adhere to the rules of concatenation, I don't think it's a common or useful functionality. My order of preferences is roughly:

1. Do nothing (even if you don't like {**d1, **d2}, .copy() followed by .update() is obvious, and we don't need more than one way to do it)
2. Add a new method to dict, e.g. dict.merge (whether it's a class method or an instance method is irrelevant to me)
3. Use | (because dicts are *far* more like sets than they are like sequences, and the semi-lossy rules of unioning make more sense there); it would also make - make sense, since + is only matched by - in numeric contexts; on collections, | and - are paired. And I consider the - functionality the most useful part of this whole proposal (because I *have* wanted to drop a collection of known blacklisted keys from a dict and while it's obvious you can do it by looping, I always wanted to be able to do something like d1.keys() -= badkeys, and remain disappointed nothing like it is available)

-Josh Rosenberg