I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).
I think the behavior proposed in the PEP makes sense whether you think of "+" as meaning "concatenation" or "merging". If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict. But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP. This also makes explaining the behavior of "d1 + d2" slightly easier than explaining "d1 | d2". For the former, you can just say "d1 + d2 means we concat the two dicts together" and stop there. You almost don't need to explain the merging/right-most key wins behavior at all, since that behavior is the only one consistent with the existing language rules. In contrast, you *would* need to explain this with "d1 | d2": I would mentally translate this expression to mean "take the union of these two dicts" and there's no real way to deduce which key-value pair ends up in the final dict given that framing. Why is it that key-value pairs in d2 win over pairs in d1 here? That choice seems pretty arbitrary when you think of this operation in terms of unions, rather than either concat or merge. Using "|" would also violate an important existing property of unions: the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)". Personally, I don't really have a strong opinion on this PEP, or the other one I've seen proposed where we add a "d1.merge(d2, d3, ...)". But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild. -- Michael On Wed, Mar 6, 2019 at 4:53 AM David Mertz <mertz@gnosis.cx> wrote:
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).
I think growing the full collection of set operations world be a pleasant addition to dicts. I think shoe-horning in plus would always be jarring to me.
On Wed, Mar 6, 2019, 5:30 AM Ka-Ping Yee <zestyping@gmail.com> wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists.
The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:
1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().
2. The values of dict2 take priority over the values of dict1.
3. When either operand is a set, it is treated as a dict whose values are None.
This yields many useful operations and, most importantly, is simple to explain. "sets and dicts can |, &, -" takes up less space in your brain than "sets can |, &, - but dicts can only + and -, where dict + is like set |".
merge and update some items:
{'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}
pick some items:
{'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}
remove some items:
{'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}
reset values of some keys:
{'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}
ensure certain keys are present:
{'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}
pick some items:
{'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}
remove some items:
{'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}
On Wed, Mar 6, 2019 at 1:51 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key])
if key == "tags": return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/