PEP: Dict addition and subtraction
Attached is a draft PEP on adding + and - operators to dict for discussion. This should probably go here: https://github.com/python/peps but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated. -- Steven
If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
I don't think so. https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html are about kwargs. I think non string keys are allowed for {**d1, **d2} by language. -- INADA Naoki <songofacandy@gmail.com>
On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
I don't think so. https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html are about kwargs. I think non string keys are allowed for {**d1, **d2} by language.
Is this documented somewhere? Or is there a pronouncement somewhere that it is definitely expected to work in any language calling itself Python? Thanks, -- Steven
On Tue, Mar 5, 2019 at 7:26 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2].
I don't think so. https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html are about kwargs. I think non string keys are allowed for {**d1, **d2} by language.
Is this documented somewhere?
It is not explicitly documented. But unlike keyword argument, dict display supported non-string keys from very old. I believe {3: 4} is supported by Python language, not CPython implementation behavior. https://docs.python.org/3/reference/expressions.html#grammar-token-dict-disp...
Or is there a pronouncement somewhere that it is definitely expected to work in any language calling itself Python?
Thanks,
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
On Tue, Mar 5, 2019 at 2:38 AM Inada Naoki <songofacandy@gmail.com> wrote:
On Tue, Mar 5, 2019 at 7:26 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Mar 02, 2019 at 01:47:37AM +0900, INADA Naoki wrote:
If the keys are not strings, it currently works in CPython, but it
may not work with other implementations, or future versions of CPython[2].
I don't think so. https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html are about kwargs. I think non string keys are allowed for {**d1, **d2} by language.
Is this documented somewhere?
It is not explicitly documented. But unlike keyword argument, dict display supported non-string keys from very old.
I believe {3: 4} is supported by Python language, not CPython implementation behavior.
https://docs.python.org/3/reference/expressions.html#grammar-token-dict-disp...
I'd like to remove all doubt: {**d1} needs to work regardless of the key type, as long as it's hashable (d1 could be some mapping implemented without hashing, e.g. using a balanced tree, so that it could support unhashable keys). If there's doubt about this anywhere, we could add an example to the docs and to the PEP. -- --Guido van Rossum (python.org/~guido)
I'd like to remove all doubt: {**d1} needs to work regardless of the key type, as long as it's hashable (d1 could be some mapping implemented without hashing, e.g. using a balanced tree, so that it could support unhashable keys).
If there's doubt about this anywhere, we could add an example to the docs and to the PEP.
On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it. / Anders
On Tue, Mar 5, 2019 at 9:02 AM Anders Hovmöller <boxed@killingar.net> wrote:
On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it.
The ice is much thinner there, but my position is that as long as they are *strings* such keys should be allowed. -- --Guido van Rossum (python.org/~guido)
There was a thread for the topic. https://mail.python.org/pipermail/python-dev/2018-October/155435.html 2019年3月6日(水) 2:02 Anders Hovmöller <boxed@killingar.net>:
On a related note: **kwargs, should they support arbitrary strings as keys? I depend on this behavior in production code and all python implementations handle it.
/ Anders
2019-03-05 0:34 UTC+01:00, Brandt Bucher <brandtbucher@gmail.com>:
Is there other built-in types which act differently if called with the operator or augmented assignment version?
list.__iadd__ and list.extend
2019-03-05 0:57 UTC+01:00, Guido van Rossum <guido@python.org>:
Yes. The same happens for lists. [1] + 'a' is a TypeError, but a += 'a' works:
Oh, I can't believe I'm learning that just today while I'm using Python since years. Thanks for the clarification. This makes perfect sense for += to behaves like .update() then.
I’ve never been part of this process before, but I’m interested in learning and helping any way I can. My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations). https://github.com/python/cpython/pull/12088 Brandt
On Mar 1, 2019, at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven <dict_addition_pep.txt> _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Mar 1, 2019 at 8:50 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.
Thanks!
My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).
When your proposed patch is complete, Brandt, just ask Steven to update the PEP to mention that there's a proposed implementation attached to the issue tracking the idea. -Brett
Brandt
On Mar 1, 2019, at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven
<dict_addition_pep.txt>
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
While working through my implementation, I've come across a couple of inconsistencies with the current proposal:
The merge operator will have the same relationship to the dict.update method as the list concatenation operator has to list.extend, with dict difference being defined analogously.
I like this premise. += for lists *behaves* like extend, and += for dicts *behaves* like update. However, later in the PEP it says:
Augmented assignment will just call the update method. This is analogous to the way list += calls the extend method, which accepts any iterable, not just lists.
In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations (and "+=" won't call an overridden "extend" method). So a more analogous pseudo-implementation (if that's what we seek) would look like: def __add__(self, other): if isinstance(other, dict): new = dict.copy(self) dict.update(new, other) return new return NotImplemented def __radd__(self, other): if isinstance(other, dict): new = dict.copy(other) dict.update(other, self) return new return NotImplemented def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented This is what my C looks like right now. We can choose to update these semantics to be "nicer" to subclasses, but I don't see any precedent for it (lists, sets, strings, etc.). Brandt On Fri, Mar 1, 2019 at 11:41 AM Brett Cannon <brett@python.org> wrote:
On Fri, Mar 1, 2019 at 8:50 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.
Thanks!
My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).
When your proposed patch is complete, Brandt, just ask Steven to update the PEP to mention that there's a proposed implementation attached to the issue tracking the idea.
-Brett
Brandt
On Mar 1, 2019, at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven
<dict_addition_pep.txt>
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I think that sequence should be fixed. On Fri., Mar. 1, 2019, 7:12 p.m. Brandt Bucher, <brandtbucher@gmail.com> wrote:
While working through my implementation, I've come across a couple of inconsistencies with the current proposal:
The merge operator will have the same relationship to the dict.update method as the list concatenation operator has to list.extend, with dict difference being defined analogously.
I like this premise. += for lists *behaves* like extend, and += for dicts *behaves* like update.
However, later in the PEP it says:
Augmented assignment will just call the update method. This is analogous to the way list += calls the extend method, which accepts any iterable, not just lists.
In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations (and "+=" won't call an overridden "extend" method). So a more analogous pseudo-implementation (if that's what we seek) would look like:
def __add__(self, other): if isinstance(other, dict): new = dict.copy(self) dict.update(new, other) return new return NotImplemented
def __radd__(self, other): if isinstance(other, dict): new = dict.copy(other) dict.update(other, self) return new return NotImplemented
def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented
This is what my C looks like right now. We can choose to update these semantics to be "nicer" to subclasses, but I don't see any precedent for it (lists, sets, strings, etc.).
Brandt
On Fri, Mar 1, 2019 at 11:41 AM Brett Cannon <brett@python.org> wrote:
On Fri, Mar 1, 2019 at 8:50 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
I’ve never been part of this process before, but I’m interested in learning and helping any way I can.
Thanks!
My addition implementation is attached to the bpo, and I’m working today on bringing it in line with the PEP in its current form (specifically, subtraction operations).
When your proposed patch is complete, Brandt, just ask Steven to update the PEP to mention that there's a proposed implementation attached to the issue tracking the idea.
-Brett
Brandt
On Mar 1, 2019, at 08:26, Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven
<dict_addition_pep.txt>
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/jq5QVTt3CAI/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Executive summary: - I'm going to argue for subclass-preserving behaviour; - I'm not wedded to the idea that dict += should actually call the update method, so long as it has the same behaviour; - __iadd__ has no need to return NotImplemented or type-check its argument. Details below. On Fri, Mar 01, 2019 at 04:10:44PM -0800, Brandt Bucher wrote: [...]
In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations
Right -- and I think they are wrong to do so, for reasons I explained here: https://mail.python.org/pipermail/python-ideas/2019-March/055547.html I think the standard handling of subclasses in Python builtins is wrong, and I don't wish to emulate that wrong behaviour without a really good reason. Or at least a better reason than "other methods break subclassing unless explicitly overloaded, so this should do so too". Or at least not without a fight :-)
(and "+=" won't call an overridden "extend" method).
I'm slightly less opinionated about that. Looking more closely into the docs, I see that they don't actually say that += calls list.extend: s.extend(t) extends s with the contents of t (for or s += t the most part the same as s[len(s):len(s)] = t) https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types only that they have the same effect. So the wording re lists calling extend certainly needs to be changed. But that doesn't mean that we must change the implementation. We have a choice: - regardless of what lists do, we define += for dicts as literally calling dict.update; the more I think about it, the less I like this. - Or we say that += behaves similarly to update, without actually calling the method. I think I prefer this. (The second implies either that += either contains a duplicate of the update logic, or that += and update both delegate to a private, C-level function that does most of the work.) I think that the second approach (define += as having the equivalent semantics of update but without actually calling the update method) is probably better. That decouples the two methods, allows subclasses to change one without necessarily changing the other.
So a more analogous pseudo-implementation (if that's what we seek) would look like:
def __add__(self, other): if isinstance(other, dict): new = dict.copy(self) dict.update(new, other) return new return NotImplemented
We should not require the copy method. The PEP should be more explicit that the approximate implementation does not imply the copy() and update() methods are actually called.
def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented
I don't agree with that implementation. According to PEP 203, which introduced augmented assignment, the sequence of calls in ``d += e`` is: 1. Try to call ``d.__iadd__(e)``. 2. If __iadd__ is not present, try ``d.__add__(e)``. 3. If __add__ is missing too, try ``e.__radd__(d)``. but my tests suggest this is inaccurate. I think the correct behaviour is this: 1. Try to call ``d.__iadd__(e)``. 2. If __iadd__ is not present, or if it returns NotImplemented, try ``d.__add__(e)``. 3. If __add__ is missing too, or if it returns NotImplemented, fail with TypeError. In other words, e.__radd__ is not used. We don't want dict.__iadd__ to try calling __add__, since the later is more restrictive and less efficient than the in-place merge. So there is no need for __iadd__ to return NotImplemented. It should either succeed on its own, or fail hard: def __iadd__(self, other): self.update(other) return self Except that the actual C implementation won't call the update method itself, but will follow the same semantics. See the docstring for dict.update for details of what is accepted by update. -- Steven
Hi Steven. Thanks for the clarifications. I've pushed a complete working patch (with tests) to GitHub. It's linked to the bpo issue. Branch: https://github.com/brandtbucher/cpython/tree/addiction PR: https://github.com/python/cpython/pull/12088 Right now, it's pretty much a straight reimplementation of your Python examples. I plan to update it periodically to keep it in sync with any changes, and to make a few optimizations (for example, when operands are identical or empty). Let me know if you have any questions/suggestions. Stoked to learn and help out with this process! :) Brandt On Fri, Mar 1, 2019 at 7:57 PM Steven D'Aprano <steve@pearwood.info> wrote:
Executive summary:
- I'm going to argue for subclass-preserving behaviour;
- I'm not wedded to the idea that dict += should actually call the update method, so long as it has the same behaviour;
- __iadd__ has no need to return NotImplemented or type-check its argument.
Details below.
On Fri, Mar 01, 2019 at 04:10:44PM -0800, Brandt Bucher wrote:
[...]
In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations
Right -- and I think they are wrong to do so, for reasons I explained here:
https://mail.python.org/pipermail/python-ideas/2019-March/055547.html
I think the standard handling of subclasses in Python builtins is wrong, and I don't wish to emulate that wrong behaviour without a really good reason. Or at least a better reason than "other methods break subclassing unless explicitly overloaded, so this should do so too".
Or at least not without a fight :-)
(and "+=" won't call an overridden "extend" method).
I'm slightly less opinionated about that. Looking more closely into the docs, I see that they don't actually say that += calls list.extend:
s.extend(t) extends s with the contents of t (for or s += t the most part the same as s[len(s):len(s)] = t)
https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types
only that they have the same effect. So the wording re lists calling extend certainly needs to be changed. But that doesn't mean that we must change the implementation. We have a choice:
- regardless of what lists do, we define += for dicts as literally calling dict.update; the more I think about it, the less I like this.
- Or we say that += behaves similarly to update, without actually calling the method. I think I prefer this.
(The second implies either that += either contains a duplicate of the update logic, or that += and update both delegate to a private, C-level function that does most of the work.)
I think that the second approach (define += as having the equivalent semantics of update but without actually calling the update method) is probably better. That decouples the two methods, allows subclasses to change one without necessarily changing the other.
So a more analogous pseudo-implementation (if that's what we seek) would look like:
def __add__(self, other): if isinstance(other, dict): new = dict.copy(self) dict.update(new, other) return new return NotImplemented
We should not require the copy method.
The PEP should be more explicit that the approximate implementation does not imply the copy() and update() methods are actually called.
def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented
I don't agree with that implementation.
According to PEP 203, which introduced augmented assignment, the sequence of calls in ``d += e`` is:
1. Try to call ``d.__iadd__(e)``.
2. If __iadd__ is not present, try ``d.__add__(e)``.
3. If __add__ is missing too, try ``e.__radd__(d)``.
but my tests suggest this is inaccurate. I think the correct behaviour is this:
1. Try to call ``d.__iadd__(e)``.
2. If __iadd__ is not present, or if it returns NotImplemented, try ``d.__add__(e)``.
3. If __add__ is missing too, or if it returns NotImplemented, fail with TypeError.
In other words, e.__radd__ is not used.
We don't want dict.__iadd__ to try calling __add__, since the later is more restrictive and less efficient than the in-place merge. So there is no need for __iadd__ to return NotImplemented. It should either succeed on its own, or fail hard:
def __iadd__(self, other): self.update(other) return self
Except that the actual C implementation won't call the update method itself, but will follow the same semantics.
See the docstring for dict.update for details of what is accepted by update.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown. This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}. The second syntax makes it clear that a new dictionary is being constructed and that d2 overrides keys from d1. One can reasonably expect or imagine a situation where a section of code that expects to merge two dictionaries with non-conflicting keys commits a semantic error if it merges two dictionaries with conflicting keys. To better explain, imagine a program where options is a global variable storing parsed values from the command line. def verbose_options(): if options.quiet return {'verbose': True} def quiet_options(): if options.quiet: return {'verbose': False} If we were to define an options() function, return {**quiet_options(), **verbose_options()} implies that verbose overrules quiet; whereas return quiet_options() + verbose_options() implies that verbose and quiet cannot be used simultaneously. I am not aware of another easy way in Python to merge dictionaries while checking for non-conflicting keys. Compare: def settings(): return {**quiet_options(), **verbose_options()} def settings(): try: return quiet_options() + verbose_options() except KeyError: print('conflicting options used', sys.stderr') sys.exit(1) *** This is a simple scenario, but you can imagine more complex ones as well. Does —quiet-stage-1 loosen —verbose? Does —quiet-stage-1 conflict with —verbose-stage-1?Does —verbosity=5 override —verbosity=4 or cause an error? Having {**, **} and + do different things provides a convenient and Pythonic way to model such relationships in code. Indeed, you can even combine the two syntaxes in the same expression to show a mix of overriding and exclusionary behavior. Anyways, I think it’s a good idea to have this semantic difference in behavior so Python developers have a good way to communicate what is expected of the two dictionaries being merged inside the language. This is like an assertion without Again, I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown, because such “non-conflicting merge” behavior would be useful in Python. It gives clarifying power to the + sign. The + and the {**, **} should serve different roles. In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax. People expect the + operator to be commutative, while the {**, **} syntax prompts further examination by virtue of its “weird” syntax.
James Lu schrieb am 04.03.19 um 03:28:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
Please, no. That would be really annoying. If you need that feature, it can become a new method on dicts. Stefan
On Mar 4, 2019, at 3:41 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
James Lu schrieb am 04.03.19 um 03:28:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
Please, no. That would be really annoying.
If you need that feature, it can become a new method on dicts.
Stefan If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.
On Mon, Mar 04, 2019 at 10:01:23AM -0500, James Lu wrote:
If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.
In your previous email, you said the {**d ...} syntax was implicit: In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax. It is difficult to take your "explicit/implicit" argument seriously when you cannot even decided which is which. -- Steven
On Mon, Mar 04, 2019 at 10:01:23AM -0500, James Lu wrote:
If you want to merge it without a KeyError, learn and use the more explicit {**d1, **d2} syntax.
On Mar 4, 2019, at 10:25 AM, Steven D'Aprano <steve@pearwood.info> wrote:
In your previous email, you said the {**d ...} syntax was implicit:
In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax.
It is difficult to take your "explicit/implicit" argument seriously when you cannot even decided which is which. I misspoke. In your previous email, you said the {**d ...} syntax was implicit:
In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys, and the KeyError is expressed suppressed by virtue of not using the {**, **} syntax.
It is difficult to take your "explicit/implicit" argument seriously when you cannot even decided which is which.
Yes, + is explicit. {**, **} is implicit. My argument: We should set the standard that + is for non-conflicting merge and {**, **} is for overriding merge. That standard should be so that + explicitly asserts that the keys will not conflict whereas {**d1, **d2} is ambiguous on why d2 is overriding d1.^ ^Presumably you’re making a copy of d1 so why should d3 have d2 take priority? The syntax deserves a comment, perhaps explaining that items from d2 are newer in time or that the items in d1 are always nonces. The + acts as an implicit assertion and an opportunity to catch an invariant violation or data input error. Give me an example of a situation where you need a third dictionary from two existing dictionaries and having conflict where a key has a different value in both is desirable behavior. The situation where non-conflicting merge is what’s desired is more common and in that case throwing an exception in the case of a conflicting value is a good thing, a way to catch code smell.
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to dict.update() which returns a new dict. dict.update has existed since Python 1.5 (something like a quarter of a century!) and never grown a "unique keys" version. I don't recall even seeing a request for such a feature. If such a unique keys version is useful, I don't expect it will be useful often.
This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
One of the reasons for preferring + is that it is an obvious way to do something very common, while {**d1, **d2} is as far from obvious as you can get without becoming APL or Perl :-) If I needed such a unique key version of update, I'd use a subclass: class StrictDict(dict): def __add__(self, other): if isinstance(other, dict) and (self.keys() & other.keys()): raise KeyError('non-unique keys') return super().__add__(self, other) # and similar for __radd__. rather than burden the entire language, and every user of it, with having to learn the subtle difference between the obvious + operator and the error-prone and unobvious trick of {*d1, *d2}. ( Did you see what I did there? *wink* )
The second syntax makes it clear that a new dictionary is being constructed and that d2 overrides keys from d1.
Only because you have learned the rule that {**d, **e) means to construct a new dict by merging, with the rule that in the event of duplicate keys, the last key seen wins. If you hadn't learned that rule, there is nothing in the syntax which would tell you the behaviour. We could have chosen any rule we liked: - raise an exception, like you get a TypeError if you pass the same keyword argument to a function twice: spam(foo=1, foo=2); - first value seen wins; - last value seen wins; - random value wins; - anything else we liked! There is nothing "clear" about the syntax which makes it obvious which behaviour is implemented. We have to learn it.
One can reasonably expect or imagine a situation where a section of code that expects to merge two dictionaries with non-conflicting keys commits a semantic error if it merges two dictionaries with conflicting keys.
I can imagine it, but I don't think I've ever needed it, and I can't imagine wanting it often enough to wish it was not just a built-in function or method, but actual syntax. Do you have some real examples of wanting an error when trying to update a dict if keys match?
To better explain, imagine a program where options is a global variable storing parsed values from the command line.
def verbose_options(): if options.quiet return {'verbose': True}
def quiet_options(): if options.quiet: return {'verbose': False}
That seems very artifical to me. Why not use a single function: def verbose_options(): # There's more than one? return {'verbose': not options.quiet} The way you have written those functions seems weird to me. You already have a nice options object, with named fields like "options.quiet", why are you turning it into not one but *two* different dicts, both reporting the same field? And its buggy: if options.quiet is True, then the key 'quiet' should be True, not the 'verbose' key. Do you have *two* functions for every preference setting that takes a true/false flag? What do you do for preference settings that take multiple values? Create a vast number of specialised functions, one for each possible value? def A4_page_options(): if options.page_size == 'A4': return {'page_size': 'A4'} def US_Letter_page_options(): if options.page_size == 'US Letter': return {'page_size': 'US Letter'} page_size = ( A4_page_options() + A3_page_options() + A5_page_options() + Foolscape_page_options + Tabloid_page_options() + US_Letter_page_options() + US_Legal_page_options() # and about a dozen more... ) The point is, although I might be wrong, I don't think that this example is a practical, realistic use-case for a unique keys version of update. To me, your approach seems so complicated and artificial that it seems like it was invented specifically to justify this "unique key" operator, not something that we would want to write in real life. But even if it real code, the question is not whether it is EVER useful for a dict update to raise an exception on matching keys. The question is whether this is so often useful that this is the behaviour we want to make the default for dicts. [...]
Again, I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown, because such “non-conflicting merge” behavior would be useful in Python.
I don't think it would be, at least not often. If it were common enough to justify a built-in operator to do this, we would have had many requests for a dict.unique_update or similar by now, and I don't think we have.
It gives clarifying power to the + sign. The + and the {**, **} should serve different roles.
In other words, explicit + is better than implicit {**, **#, unless explicitly suppressed. Here + is explicit whereas {**, **} is implicitly allowing inclusive keys,
If I had a cent for every time people misused "explicit" to mean "the proposal that I like", I'd be rich. In what way is the "+" operator *explicit* about raising an exception on duplicate keys? These are both explicit: merge_but_raise_exception_if_any_duplicates(d1, d2) merge(d1, d2, raise_if_duplicates=True) and these are both equally implicit: d1 + d2 {**d1, **d2} since the behaviour on duplicates is not explicitly stated in clear and obvious language, but implied by the rules of the language. [...]
People expect the + operator to be commutative
THey are wrong to expect that, because the + operator is already not commutative for: str bytes bytearray list tuple array.array collections.deque collections.Counter and possibly others. -- Steven
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to dict.update() which returns a new dict. dict.update has existed since Python 1.5 (something like a quarter of a century!) and never grown a "unique keys" version.
I don't recall even seeing a request for such a feature. If such a unique keys version is useful, I don't expect it will be useful often.
I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise): 1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays) 2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2)) dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.
This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.
One of the reasons for preferring + is that it is an obvious way to do something very common, while {**d1, **d2} is as far from obvious as you can get without becoming APL or Perl :-)
From the moment PEP 448 published, I've been using unpacking as a more
composable/efficient form of concatenation, merging, etc. I'm sorry you don't find it obvious, but a couple e-mails back you said: "The Zen's prohibition against guessing in the face of ambiguity does not mean that we must not add a feature to the language that requires the user to learn what it does first." Learning to use the unpacking syntax in the case of function calls is necessary for tons of stuff (writing general function decorators, handling initialization in class hierarchies, etc.), and as PEP 448 is titled, this is just a generalization combining the features of unpacking arguments with collection literals.
The second syntax makes it clear that a new dictionary is being
constructed and that d2 overrides keys from d1.
Only because you have learned the rule that {**d, **e) means to construct a new dict by merging, with the rule that in the event of duplicate keys, the last key seen wins. If you hadn't learned that rule, there is nothing in the syntax which would tell you the behaviour. We could have chosen any rule we liked:
No, because we learned the general rule for dict literals that {'a': 1, 'a': 2} produces {'a': 2}; the unpacking generalizations were very good about adhering to the existing rules, so it was basically zero learning curve if you already knew dict literal rules and less general unpacking rules. The only part to "learn" is that when there is a conflict between dict literal rules and function call rules, dict literal rules win. To be clear: I'm not supporting + as raising error on non-unique keys. Even if it makes dict + dict adhere to the rules of concatenation, I don't think it's a common or useful functionality. My order of preferences is roughly: 1. Do nothing (even if you don't like {**d1, **d2}, .copy() followed by .update() is obvious, and we don't need more than one way to do it) 2. Add a new method to dict, e.g. dict.merge (whether it's a class method or an instance method is irrelevant to me) 3. Use | (because dicts are *far* more like sets than they are like sequences, and the semi-lossy rules of unioning make more sense there); it would also make - make sense, since + is only matched by - in numeric contexts; on collections, | and - are paired. And I consider the - functionality the most useful part of this whole proposal (because I *have* wanted to drop a collection of known blacklisted keys from a dict and while it's obvious you can do it by looping, I always wanted to be able to do something like d1.keys() -= badkeys, and remain disappointed nothing like it is available) -Josh Rosenberg
On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg < shadowranger+pythonideas@gmail.com> wrote:
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to dict.update() which returns a new dict. dict.update has existed since Python 1.5 (something like a quarter of a century!) and never grown a "unique keys" version.
I don't recall even seeing a request for such a feature. If such a unique keys version is useful, I don't expect it will be useful often.
I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):
1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays) 2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))
dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.
I must by now have seen dozens of post complaining about this aspect of the proposal. I think this is just making up rules (e.g. "+ never loses information") to deal with an aspect of the design where a *choice* must be made. This may reflect the Zen of Python's "In the face of ambiguity, refuse the temptation to guess." But really, that's a pretty silly rule (truly, they aren't all winners). Good interface design constantly makes choices in ambiguous situations, because the alternative is constantly asking, and that's just annoying. We have a plethora of examples (in fact, almost all alternatives considered) of situations related to dict merging where a choice is made between conflicting values for a key, and it's always the value further to the right that wins: from d[k] = v (which overrides the value when k is already in the dict) to d1.update(d2) (which lets the values in d2 win), including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has a well-defined meaning where the latter value wins. As to why raising is worse: First, none of the other situations I listed above raises for conflicts. Second, there's the experience of str+unicode in Python 2, which raises if the str argument contains any non-ASCII bytes. In fact, we disliked it so much that we changed the language incompatibly to deal with it. -- --Guido van Rossum (python.org/~guido)
On Tue, Mar 5, 2019 at 3:50 PM Josh Rosenberg < shadowranger+pythonideas@gmail.com> wrote:
On Tue, Mar 5, 2019 at 11:16 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:
I propose that the + sign merge two python dictionaries such that if there are conflicting keys, a KeyError is thrown.
This proposal is for a simple, operator-based equivalent to dict.update() which returns a new dict. dict.update has existed since Python 1.5 (something like a quarter of a century!) and never grown a "unique keys" version.
I don't recall even seeing a request for such a feature. If such a unique keys version is useful, I don't expect it will be useful often.
I have one argument in favor of such a feature: It preserves concatenation semantics. + means one of two things in all code I've ever seen (Python or otherwise):
1. Numeric addition (including element-wise numeric addition as in Counter and numpy arrays) 2. Concatenation (where the result preserves all elements, in order, including, among other guarantees, that len(seq1) + len(seq2) == len(seq1 + seq2))
dict addition that didn't reject non-unique keys wouldn't fit *either* pattern; the main proposal (making it equivalent to left.copy(), followed by .update(right)) would have the left hand side would win on ordering, the right hand side on values, and wouldn't preserve the length invariant of concatenation. At least when repeated keys are rejected, most concatenation invariants are preserved; order is all of the left elements followed by all of the right, and no elements are lost.
I must by now have seen dozens of post complaining about this aspect of the proposal. I think this is just making up rules (e.g. "+ never loses information") to deal with an aspect of the design where a *choice* must be made. This may reflect the Zen of Python's "In the face of ambiguity, refuse the temptation to guess." But really, that's a pretty silly rule (truly, they aren't all winners). Good interface design constantly makes choices in ambiguous situations, because the alternative is constantly asking, and that's just annoying.
We have a plethora of examples (in fact, almost all alternatives considered) of situations related to dict merging where a choice is made between conflicting values for a key, and it's always the value further to the right that wins: from d[k] = v (which overrides the value when k is already in the dict) to d1.update(d2) (which lets the values in d2 win), including the much lauded {**d1, **d2} and even plain {'a': 1, 'a': 2} has a well-defined meaning where the latter value wins.
Yeah. And I'm fine with the behavior for update because the name itself is descriptive; we're spelling out, in English, that we're update-ing the
On Wed, Mar 6, 2019 at 12:08 AM Guido van Rossum <guido@python.org> wrote: thing it's called on, so it makes sense to have the thing we're sourcing for updates take precedence. Similarly, for dict literals (and by extension, unpacking), it's following an existing Python convention which doesn't contradict anything else. Overloading + lacks the clear descriptive aspect of update that describes the goal of the operation, and contradicts conventions (in Python and elsewhere) about how + works (addition or concatenation, and a lot of people don't even like it doing the latter, though I'm not that pedantic). A couple "rules" from C++ on overloading are "*Whenever the meaning of an operator is not obviously clear and undisputed, it should not be overloaded.* *Instead, provide a function with a well-chosen name.*" and "*Always stick to the operator’s well-known semantics".* (Source: https://stackoverflow.com/a/4421708/364696 , though the principle is restated in many other places). Obviously the C++ community isn't perfect on this (see iostream and <</>> operators), but they're otherwise pretty consistent. + means addition, and in many languages including C++ strings, concatenation, but I don't know of any languages outside the "esoteric" category that use it for things that are neither addition nor concatenation. You've said you don't want the whole plethora of set-like behaviors on dicts, but dicts are syntactically and semantically much more like sets than sequences, and if you add + (with semantics differing from both sets and sequences), the language becomes less consistent. I'm not against making it easier to merge dictionaries. But people seem to be arguing that {**d1, **d2} is bad because of magic punctuation that obscures meaning, when IMO: d3 = d1 + d2 is obscuring meaning by adding yet a third rule for what + means, inconsistent with both existing rules (from both Python and the majority of languages I've had cause to use). A named method (class or instance) or top-level function (a la sorted) is more explicit, easier to look up (after all, the major complaint about ** syntax is the difficulty of finding the documentation on it). It's also easier to make it do the right thing; d1 + d2 + d3 + ... dN is inefficient (makes many unnecessary temporaries), {**d1, **d2, **d3, ..., **dN} is efficient but obscure (and not subclass friendly), but a varargs method like dict.combine(d1, d2, d3, ..., dN) (or merge, or whatever; I'm not trying to bikeshed) is correct, efficient, and most importantly, easy to look up documentation for. I occasionally find it frustrating that concatenation exists given the wealth of Schlemiel the Painter's algorithms it encourages, and the "correct" solution for combining sequences (itertools.chain for general cases, str.join/bytes.join for special cases) being less obvious means my students invariably use the "wrong" tool out of convenience (and it's not really wrong in 90% of code where the lengths are always short, but then they use it where lengths are often huge and suffer for it). If we're going to make dict merging more convenient, I'd prefer we make the obvious, convenient solution also the one that doesn't encourage non-scalable anti-patterns. As to why raising is worse: First, none of the other situations I listed
above raises for conflicts. Second, there's the experience of str+unicode in Python 2, which raises if the str argument contains any non-ASCII bytes. In fact, we disliked it so much that we changed the language incompatibly to deal with it.
Agreed, I don't like raising. It's consistent with + (the only argument in favor of it really), but it's a bad idea, for all the reasons you mention. - Josh Rosenberg
On Wed, 6 Mar 2019 00:46:57 +0000 Josh Rosenberg <shadowranger+pythonideas@gmail.com> wrote:
Overloading + lacks the clear descriptive aspect of update that describes the goal of the operation, and contradicts conventions (in Python and elsewhere) about how + works (addition or concatenation, and a lot of people don't even like it doing the latter, though I'm not that pedantic).
A couple "rules" from C++ on overloading are "*Whenever the meaning of an operator is not obviously clear and undisputed, it should not be overloaded.* *Instead, provide a function with a well-chosen name.*" and "*Always stick to the operator’s well-known semantics".* (Source: https://stackoverflow.com/a/4421708/364696 , though the principle is restated in many other places).
Agreed with this. What is so useful exactly in this new dict operator that it hasn't been implemented, say, 20 years ago? I rarely find myself merging dicts and, when I do, calling dict.update() is entirely acceptable (I think the "{**d}" notation was already a mistake, making a perfectly readable operation more cryptic simply for the sake of saving a few keystrokes). Built-in operations should be added with regard to actual user needs (such as: a first-class notation for matrix multiplication, making formulas easier to read and understand), not a mere "hmm this might sometimes be useful". Besides, if I have two dicts with e.g. lists as values, I *really* dislike the fact that the + operator will clobber the values rather than concatenate them. It's a recipe for confusion. Regards Antoine.
On Fri, Mar 15, 2019 at 12:20:21PM +0100, Antoine Pitrou wrote:
Agreed with this. What is so useful exactly in this new dict operator that it hasn't been implemented, say, 20 years ago?
One could say the same thing about every new feature. Since Python 1.5 was so perfect, why add Unicode, decorators, matrix multiplication, async, descriptors, Decimal, iterators, ... Matrix multiplication is a perfect example: adding the @ operator could have been done in Python 0.1 if anyone had thought of it, but it took 15 years of numerical folk "whinging" about the lack until it happened: https://mail.python.org/pipermail/python-ideas/2014-March/027053.html In some ways, it is often easier to get community buy-in for *big* changes, provided they are backwards compatible. With a big change, people often either want it, or don't care one way or another. (Sometimes because the big change is too big or complicated or difficult for them to understand -- I feel that way about async. Some day I'll deal with it, but right now it's so far down my list of priorities that I have no opinion on anything to do with async.) But *little* changes are easy enough for everyone to understand, and so they trigger the impulse to bike-shed. Everyone has an opinion on whether or not dicts should support an update operator, and whether to spell it + or | or <- or << or something else. Or the infamous := operator, which ultimately is a useful but minor syntactic and semantic change but generated a huge amount of debate, argument and negativity. A far smaller change to the language than adding type hinting, but it generated far more argument. I still remember being told in no uncertain terms by the core devs that adding a clear() method to lists was a waste of time because there was already a perfectly good way to spell it with slicing. And then ABCs came along and now lists have a clear method. So opinions change too. Things happen when they happen, because if they had happened earlier we wouldn't still be arguing about them.
I rarely find myself merging dicts and, when I do, calling dict.update() is entirely acceptable
The code we write is shaped by the operators and methods that exist. You use dict.update() because *it exists* so when you want a new dict merged with another, you write the code that is possible today: new = spam.copy() new.update(eggs) process(new) and you are content because you "rarely find myself merging dicts". But perhaps those who *frequently* merge dicts have a different option, and would prefer to write one line rather than three and avoid naming something that doesn't need a name: process(spam + eggs) # or spam|eggs if you prefer
(I think the "{**d}" notation was already a mistake, making a perfectly readable operation more cryptic simply for the sake of saving a few keystrokes).
I don't know if it was a mistake, but disatisfaction with its lack of readability and discoverability is one of the motivations of this PEP. [...]
Besides, if I have two dicts with e.g. lists as values, I *really* dislike the fact that the + operator will clobber the values rather than concatenate them. It's a recipe for confusion.
Are you confused that the update method clobbers list values rather than concatenate them? I doubt that you are. So why would it be confusing to say that + does a copy-and-update? (In any case, popular opinion may be shifting towards preferring the | operator over + so perhaps confusion over concatenation may not be an issue in the future.) -- Steven
On Fri, Mar 15, 2019 at 11:42 AM Steven D'Aprano <steve@pearwood.info> wrote:
[snip]
I still remember being told in no uncertain terms by the core devs that adding a clear() method to lists was a waste of time because there was already a perfectly good way to spell it with slicing. And then ABCs came along and now lists have a clear method. So opinions change too.
I agree with the opinions expressed in the (partially) quoted message but I don't think that this is how this particular change happened.
https://mail.python.org/pipermail/python-ideas/2009-April/003897.html ;-) ;-) André Roberge
On Fri, Mar 15, 2019 at 11:54:51AM -0300, Andre Roberge wrote:
On Fri, Mar 15, 2019 at 11:42 AM Steven D'Aprano <steve@pearwood.info> wrote:
[snip]
I still remember being told in no uncertain terms by the core devs that adding a clear() method to lists was a waste of time because there was already a perfectly good way to spell it with slicing. And then ABCs came along and now lists have a clear method. So opinions change too.
I agree with the opinions expressed in the (partially) quoted message but I don't think that this is how this particular change happened.
https://mail.python.org/pipermail/python-ideas/2009-April/003897.html
You proposed that in April 2009, but there was nothing added to the bug tracker for 18 months until it was finally added by Terry Reedy in November 2010, based on discussion in a completely different thread (one about sets!): https://mail.python.org/pipermail/python-ideas/2010-November/008722.html Contrary-wise, a few years earlier the same request had been roundly dismissed by core-devs and Python luminaries as "inane", "redundant" and "trivial". https://mail.python.org/pipermail/python-list/2006-April/356236.html People can change their mind -- something that is dismissed one year may be accepted some years later on. -- Steven
On 3/15/2019 11:21 AM, Steven D'Aprano wrote:
On Fri, Mar 15, 2019 at 11:54:51AM -0300, Andre Roberge wrote:
On Fri, Mar 15, 2019 at 11:42 AM Steven D'Aprano <steve@pearwood.info> wrote:
[snip]
I still remember being told in no uncertain terms by the core devs that adding a clear() method to lists was a waste of time because there was already a perfectly good way to spell it with slicing. And then ABCs came along and now lists have a clear method. So opinions change too.
I agree with the opinions expressed in the (partially) quoted message but I don't think that this is how this particular change happened.
https://mail.python.org/pipermail/python-ideas/2009-April/003897.html
You proposed that in April 2009, but there was nothing added to the bug tracker for 18 months until it was finally added by Terry Reedy in
Actually, I opened the tracker issue with a succinct message, after the discussion and Guido's approval changed my mind. https://bugs.python.org/issue10516 However, Eli Bendersky wrote the patch with help from others and then merged it.
November 2010, based on discussion in a completely different thread (one about sets!):
https://mail.python.org/pipermail/python-ideas/2010-November/008722.html
-- Terry Jan Reedy
On Sat, 16 Mar 2019 01:41:59 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Matrix multiplication is a perfect example: adding the @ operator could have been done in Python 0.1 if anyone had thought of it, but it took 15 years of numerical folk "whinging" about the lack until it happened:
Not so perfect, as the growing use of Python for scientific computing has made it much more useful to promote a dedicated matrix multiplication operator than, say, 15 or 20 years ago. This is precisely why I worded my question this way: what has changed in the last 20 years that make a "+" dict operator more compelling today than it was? Do we merge dicts much more frequently than we did? I don't think so.
Or the infamous := operator, which ultimately is a useful but minor syntactic and semantic change but generated a huge amount of debate, argument and negativity.
... and is likely to be a mistake as well. Justifying future mistakes with past mistakes doesn't sound very reasonable ;-)
I still remember being told in no uncertain terms by the core devs that adding a clear() method to lists was a waste of time because there was already a perfectly good way to spell it with slicing. And then ABCs came along and now lists have a clear method. So opinions change too.
Not really the same problem. The "+" dict operator is not intuitively obvious in its meaning, while a "clear()" method on lists is. I wouldn't mind the new operator if its meaning was clear-cut. But here we have potential for confusion, both for writers and readers of code.
Besides, if I have two dicts with e.g. lists as values, I *really* dislike the fact that the + operator will clobber the values rather than concatenate them. It's a recipe for confusion.
Are you confused that the update method clobbers list values rather than concatenate them? I doubt that you are.
So why would it be confusing to say that + does a copy-and-update?
Because it's named "+" precisely. You know, names are important. ;-) Regards Antoine.
On Sat, Mar 16, 2019 at 12:39 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 16 Mar 2019 01:41:59 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
Matrix multiplication is a perfect example: adding the @ operator could have been done in Python 0.1 if anyone had thought of it, but it took 15 years of numerical folk "whinging" about the lack until it happened:
Not so perfect, as the growing use of Python for scientific computing has made it much more useful to promote a dedicated matrix multiplication operator than, say, 15 or 20 years ago.
Theres more to it than that, really, but not really relevant here...
This is precisely why I worded my question this way: what has changed in the last 20 years that make a "+" dict operator more compelling today than it was? Do we merge dicts much more frequently than we did?
The analogy doesn't hold because @ was a new operator -- a MUCH bigger change than dimply defining the use of + (or | ) for dicts. I wouldn't mind the new operator if its meaning was clear-cut. But
here we have potential for confusion, both for writers and readers of code.
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, 20 Mar 2019 15:46:24 -1000 Christopher Barker <pythonchb@gmail.com> wrote:
This is precisely why I worded my question this way: what has changed in the last 20 years that make a "+" dict operator more compelling today than it was? Do we merge dicts much more frequently than we did?
The analogy doesn't hold because @ was a new operator -- a MUCH bigger change than dimply defining the use of + (or | ) for dicts.
But it's less disruptive when reading code, because "x @ y" is unambiguous: it's a matrix multiplication. "x + y" can be many different things, and now it can be one more thing.
I wouldn't mind the new operator if its meaning was clear-cut. But
here we have potential for confusion, both for writers and readers of code.
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts. Regards Antoine.
On Thu, Mar 21, 2019 at 10:35 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts.
The + operator adds two things together. I don't understand the issue here. You can add integers: 1 + 2 == 3 You can add floats: 0.5 + 1.25 == 1.75 You can add lists: [1,2] + [3,4] == [1,2,3,4] You can add strings: "a" + "b" == "ab" And soon you'll be able to add dictionaries. The exact semantics need to be defined, but it's not fundamentally changing how you interpret the + operator. I don't understand the panic here - or rather, I don't understand why it's happening NOW, not back when lists got the ability to be added (if that wasn't in the very first release). Conversely, if it's the | operator, it's a matter of merging, and the same is true. You can merge integers, treating them as bit sets. You can merge sets. And now you'll be able to merge dictionaries. Same same. ChrisA
On Thu, 21 Mar 2019 23:35:36 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 21, 2019 at 10:35 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts.
The + operator adds two things together. I don't understand the issue here.
I'm not expecting you to understand, either. Regards Antoine.
On Thu, Mar 21, 2019 at 11:45 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 21 Mar 2019 23:35:36 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 21, 2019 at 10:35 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts.
The + operator adds two things together. I don't understand the issue here.
I'm not expecting you to understand, either.
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition? ChrisA
On Thu, 21 Mar 2019 23:51:12 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 21, 2019 at 11:45 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 21 Mar 2019 23:35:36 +1100 Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 21, 2019 at 10:35 PM Antoine Pitrou <solipsis@pitrou.net> wrote:
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts.
The + operator adds two things together. I don't understand the issue here.
I'm not expecting you to understand, either.
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
"Productive discussion" is something that requires mutual implication. Asking me to repeat exactly what I spelled out above (and that you even quoted) is not productive. Regards Antoine.
21.03.19 14:51, Chris Angelico пише:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Currently the + operator has 2 meanings for builtin types (both are widely used), after adding it for dicts it will have 3 meanings. 3 > 2, is not?
On Fri, Mar 22, 2019 at 12:17 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
21.03.19 14:51, Chris Angelico пише:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Currently the + operator has 2 meanings for builtin types (both are widely used), after adding it for dicts it will have 3 meanings.
3 > 2, is not?
I suppose you could call it two (numeric addition and sequence concatenation), but there are subtleties to the way that lists concatenate that don't apply to strings (esp since lists are mutable), so I'd call it at least three already. And what about non-builtin types? You can add two numpy arrays and it does pairwise addition, quite different from how lists add together. But in every case, the + operator means "add these things together". It will be the same with dicts: you add the dicts together. Antoine has stated that the problem is NOT understanding what dict.__add__ does, so I am at a loss as to what the problem IS. We *already* have many different definitions of "add", according to the data types involved. That is exactly what polymorphism is for. Why is it such a bad thing for a dict? Now, my own opinion is that | would be a better operator than +, but it's only a weak preference, and I'd be happy with either. Also, to my understanding, the concerns about "what does addition mean" apply identically to "what does Or mean", but as we've already seen, my understanding doesn't currently extend as far as comprehending this issue. Hence asking. ChrisA
21.03.19 15:24, Chris Angelico пише:
On Fri, Mar 22, 2019 at 12:17 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
21.03.19 14:51, Chris Angelico пише:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Currently the + operator has 2 meanings for builtin types (both are widely used), after adding it for dicts it will have 3 meanings.
3 > 2, is not?
I suppose you could call it two (numeric addition and sequence concatenation), but there are subtleties to the way that lists concatenate that don't apply to strings (esp since lists are mutable), so I'd call it at least three already.
I do not understand what are these subtleties that you treat list concatenation different from string concatenation. Could you please explain? In any case, it does not matter how you count meanings, n + 1 > n.
On Thu, Mar 21, 2019 at 9:17 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
21.03.19 14:51, Chris Angelico пише:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Currently the + operator has 2 meanings for builtin types (both are widely used), after adding it for dicts it will have 3 meanings.
3 > 2, is not?
It depends how abstractly you define the "meanings". If you define + as "arithmetic addition" and "sequence concatenation", then yes, there are 2. But novices have to learn that the same concatenation operator applies to strings as well as lists/tuples. And when reading x + y, it is probably relevant whether x and y are numbers, strings, or sequence containers like lists. The proposal would generalize "sequence concatenation" to something like "asymmetric sequence/collection combination". (Asymmetric because d1 + d2 may not equal d2 + d1.) It seems a natural extension to me, though the | alternative is also reasonable (interpreted as taking the OR of keys in the two dicts; but unlike unioning two sets, the dict-merge operator would be asymmetric). The third proposed alternative, <<, has no "baggage" from an existing use as a combination operator, but at the same time it is a more obscure choice.
On Thu, Mar 21, 2019 at 03:16:44PM +0200, Serhiy Storchaka wrote:
21.03.19 14:51, Chris Angelico пише:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Currently the + operator has 2 meanings for builtin types (both are widely used), after adding it for dicts it will have 3 meanings.
Just two meanings? I get at least eight among the builtins: - int addition; - float addition; - complex addition; - string concatenation; - list concatenation; - tuple concatenation; - bytes concatenation; - bytearray concatenation. I suppose if you cover one eye and focus on the "big picture", ignoring vital factors like "you can't add a list to a string" and "float addition and int addition aren't precisely the same", we might pretend that this is just two operations: - numeric addition; - sequence concatenation. But in practice, when reading code, it's usually not enough to know that some use of the + operator means "concatenation", you need to know *what* is being concatenated. There's no point trying to add a tuple if a bytearray is required.
3 > 2, is not?
Okay, but how does this make it harder to determine what a piece of code using + does? Antoine insists that *if we allow dict addition*, then we won't be able to tell what spam + eggs # for example does unless we know what spam and eggs are. This is very true. But it is *equally true today*, right now, and its been equally true going back to Python 1.0 or before. This proposed change doesn't add any uncertainty that doesn't already exist, nor will it make code that is clear today less clear tomorrow.^1 And don't forget that Python allows us to create non-builtin types that overload operators. If you don't know what spam and eggs are, you can't assume they are builtins. With operator overloading, any operator can mean literally anything at all. In practice though, this rarely becomes a serious problem. Is there a significant increase in difficulty between the current situation: # Is this addition or concatenation or something else? spam + eggs versus the proposed: # Is this addition or concatenation or merge or something else? spam + eggs Obviously there's *one more builtin* to consider, but I don't think that changes the process of understanding the meaning of the operation. I think that the problem you and Antoine fear ("dict.__add__ will make it harder to read code") requires a process that goes something like this: 1. Here's a mysterious "spam + eggs" operation we need to understand. 2. For each operation in ("numeric addition", "concatenation"): 3. assume + represents that operation; 4. if we understand the spam+eggs expression now, break If that's how we read code, then adding one more operation would make it harder to understand. We'd have to loop three times, not twice: 2. For each operation in ("numeric addition", "concatenation", "dict merging"): Three is greater than two, so we may have to do more work to understand the code. But I don't think that's how people actually read code. I think they do this: 1. Here's a mysterious "spam + eggs" operation we need to understand. 2. Read the code to find out what spam and eggs are. 3. Knowing what they are (tuples, lists, floats, etc) immediately tells you what the plus operator does; at worst, a programmer unfamiliar with the type may need to read the docs. Adding dict.__add__ doesn't make it any harder to work out what the operands spam and eggs are. The process we go through to determine what the operands are remains the same: - if one of operands is a literal, that gives you a strong hint that the other is the same type; - the names or context may make it clear ("header + text" probably isn't doing numeric addition); - read back through the code looking for where the variables are defined; etc. That last bit isn't always easy. People can write obfuscated, complex code using poor or misleading names. But allowing dict.__add__ doesn't make it more obfuscated or more complex. Usually the naming and context will make it clear. Most code is not terrible. At worst, there will be a transition period where people have a momentary surprise: "Wait, what, these are dicts??? How can you add dicts???" but then they will read the docs (or ask StackOverflow) and the second time they see it, it shouldn't be a surprise. ^1 That's my assertion, but if anyone has a concrete example of actual code which is self-evident today but will become ambiguous if this proposal goes ahead, please show me! -- Steven
Chris Angelico writes:
... then, in the interests of productive discussion, could you please explain? What is it about dict addition that makes it harder to understand than other addition?
Antoine didn't say what dict addition does is harder to understand than other addition. He said he wants to understand it without knowing what it does. I can't say for sure what he means precisely, but I take it that he wants dict "+" to obey certain regularities that other instances of "+" do, possibly including outside of Python. As you'll see I find it hard to make this precise, but it's a pretty strong feeling for me, as well. To me, those regularities include associativity (satisfied in Python except for floats) and commutativity where possible (integers and I believe floats do satisfy it, while strings cannot and other sequences in Python in general do not, although they very often do in mathematics). For mappings, the mathematical meaning of "+" is usually pointwise. This wouldn't make sense for strings (interpreted as mappings from a prefix of the natural numbers) at all except that by accident in Python s1[n] + s2[n] does make sense, but not pointwise (because the length of the result is 2, not 1, for each n). For sequences in general pointwise doesn't make sense (there's no restriction to homogeneous sequences, and if there were, like strings it's not clear that the elements would be summable in an appropriate sense). But concatenation always makes sense, especially by analogy to the somehow (IMO) canonical case of strings. For sets, the only plausible interpretation of "addition" is union, but in fact Python used .add asymmetrically as "add to", not "add together" (self is a set, argument is a generic object), and the union operator is "|", not "+". For dictionaries, neither pointwise addition nor concatenation makes sense in general, and update is "too asymmetric" for my taste, and has no analog in the usual algebras of mappings. In some sense string concatenation, though noncommutative, doesn't lose information, and it does obey a sort of antisymmetry in that a + b == reversed(reversed(b) + reversed(a)). Dictionary update does lose the original settings. If people really think it's so important to spell d = d0.copy() d.update(d1) as "d0 + d1" despite the noncommutativity (and the availability of "{**d0, **d1}" for "true" dicts), and by extension the redundant "d |= d1" for "d.update(d1)", I won't get terribly upset, but I will be sad because it offends my sense of "beautiful code" (including TOOWTDI, where "+" for dicts would violate both the "obvious" and the parenthetical "only one" conditions IMO). I would consider it a wart in the same way that many people consider str.join a wart, as it breaks even more of the regularities I associate with "+" than string concatenation does. Again, I don't know what Antoine meant, but I might say the same kind of thing in the same words, and the above is what I would mean. Steve
On 21/03/2019 11:34, Antoine Pitrou wrote:
On Wed, 20 Mar 2019 15:46:24 -1000 Christopher Barker <pythonchb@gmail.com> wrote:
This is precisely why I worded my question this way: what has changed in the last 20 years that make a "+" dict operator more compelling today than it was? Do we merge dicts much more frequently than we did?
The analogy doesn't hold because @ was a new operator -- a MUCH bigger change than dimply defining the use of + (or | ) for dicts.
But it's less disruptive when reading code, because "x @ y" is unambiguous: it's a matrix multiplication. "x + y" can be many different things, and now it can be one more thing.
"x @ y" is unambiguous once you know what it means. Until then, it's just mysterious.
I wouldn't mind the new operator if its meaning was clear-cut. But
here we have potential for confusion, both for writers and readers of code.
but it's NOT a new operator, it is making use of an existing one, and sure you could guess at a couple meanings, but the merge one is probably one of the most obvious to guess, and one quick test and you know -- I really can't see it being a ongoing source of confusion.
Did you actually read what I said? The problem is not to understand what dict.__add__ does. It's to understand what code using the + operator does, without knowing upfront whether the inputs are dicts.
Welcome to polymorphism. -- Rhodri James *-* Kynesim Ltd
Hi, I'm not old on this list but every time there is a proposal, the answer is "what are you trying to solve ?". Since |z ={**x,**y} and z.update(y) Exists, I can"t find the answer. | | | Le 02/03/2019 à 04:52, Steven D'Aprano a écrit :
Executive summary:
- I'm going to argue for subclass-preserving behaviour;
- I'm not wedded to the idea that dict += should actually call the update method, so long as it has the same behaviour;
- __iadd__ has no need to return NotImplemented or type-check its argument.
Details below.
On Fri, Mar 01, 2019 at 04:10:44PM -0800, Brandt Bucher wrote:
[...]
In your Python implementation samples from the PEP, dict subclasses will behave differently from how list subclasses do. List subclasses, without overrides, return *list* objects for bare "+" operations Right -- and I think they are wrong to do so, for reasons I explained here:
https://mail.python.org/pipermail/python-ideas/2019-March/055547.html
I think the standard handling of subclasses in Python builtins is wrong, and I don't wish to emulate that wrong behaviour without a really good reason. Or at least a better reason than "other methods break subclassing unless explicitly overloaded, so this should do so too".
Or at least not without a fight :-)
(and "+=" won't call an overridden "extend" method). I'm slightly less opinionated about that. Looking more closely into the docs, I see that they don't actually say that += calls list.extend:
s.extend(t) extends s with the contents of t (for or s += t the most part the same as s[len(s):len(s)] = t)
https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types
only that they have the same effect. So the wording re lists calling extend certainly needs to be changed. But that doesn't mean that we must change the implementation. We have a choice:
- regardless of what lists do, we define += for dicts as literally calling dict.update; the more I think about it, the less I like this.
- Or we say that += behaves similarly to update, without actually calling the method. I think I prefer this.
(The second implies either that += either contains a duplicate of the update logic, or that += and update both delegate to a private, C-level function that does most of the work.)
I think that the second approach (define += as having the equivalent semantics of update but without actually calling the update method) is probably better. That decouples the two methods, allows subclasses to change one without necessarily changing the other.
So a more analogous pseudo-implementation (if that's what we seek) would look like:
def __add__(self, other): if isinstance(other, dict): new = dict.copy(self) dict.update(new, other) return new return NotImplemented We should not require the copy method.
The PEP should be more explicit that the approximate implementation does not imply the copy() and update() methods are actually called.
def __iadd__(self, other): if isinstance(other, dict): dict.update(self, other) return self return NotImplemented I don't agree with that implementation.
According to PEP 203, which introduced augmented assignment, the sequence of calls in ``d += e`` is:
1. Try to call ``d.__iadd__(e)``.
2. If __iadd__ is not present, try ``d.__add__(e)``.
3. If __add__ is missing too, try ``e.__radd__(d)``.
but my tests suggest this is inaccurate. I think the correct behaviour is this:
1. Try to call ``d.__iadd__(e)``.
2. If __iadd__ is not present, or if it returns NotImplemented, try ``d.__add__(e)``.
3. If __add__ is missing too, or if it returns NotImplemented, fail with TypeError.
In other words, e.__radd__ is not used.
We don't want dict.__iadd__ to try calling __add__, since the later is more restrictive and less efficient than the in-place merge. So there is no need for __iadd__ to return NotImplemented. It should either succeed on its own, or fail hard:
def __iadd__(self, other): self.update(other) return self
Except that the actual C implementation won't call the update method itself, but will follow the same semantics.
See the docstring for dict.update for details of what is accepted by update.
Jimmy Girardet schrieb am 04.03.19 um 10:12:
I'm not old on this list but every time there is a proposal, the answer is "what are you trying to solve ?".
Since
|z ={**x,**y} and z.update(y) Exists, I can"t find the answer.
I think the main intentions is to close a gap in the language. [1,2,3] + [4,5,6] works for lists and tuples, {1,2,3} | {4,5,6} works for sets, but joining two dicts isn't simply {1:2, 3:4} + {5:6} but requires either some obscure syntax or a statement instead of a simple expression. The proposal is to enable the obvious syntax for something that should be obvious. Stefan
On Mon, Mar 4, 2019 at 6:52 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
I think the main intentions is to close a gap in the language.
[1,2,3] + [4,5,6]
works for lists and tuples,
{1,2,3} | {4,5,6}
works for sets, but joining two dicts isn't simply
{1:2, 3:4} + {5:6}
Operators are syntax borrowed from math. * Operators are used for concatenate and repeat (Kleene star) in regular language. https://en.wikipedia.org/wiki/Regular_language seq + seq and seq * N are very similar to it, although Python used + instead of middle dot (not in ASCII) for concatenate. * set is directly relating to set in math. | is well known operator for union. * In case of merging dict, I don't know obvious background in math or computer science. So I feel it's very natural that dict don't have operator for merging. Isn't "for consistency with other types" a wrong consistency?
but requires either some obscure syntax or a statement instead of a simple expression.
The proposal is to enable the obvious syntax for something that should be obvious.
dict.update is obvious already. Why statement is not enough? Regards,
On Mar 4, 2019, at 10:02 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
Stefan There is already an expression for key-overriding merge. Why do we need a new one?
On 04/03/2019 15:12, James Lu wrote:
On Mar 4, 2019, at 10:02 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
Stefan There is already an expression for key-overriding merge. Why do we need a new one?
Because the existing one is inobvious, hard to discover and ugly. -- Rhodri James *-* Kynesim Ltd
Hi.
the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs
I am a little surprised by this choice. First, this means that "a += b" would not be equivalent to "a = a + b". Is there other built-in types which act differently if called with the operator or augmented assignment version? Secondly, that would imply I would no longer be able to infer the type of "a" while reading "a += [('foo', 'bar')]". Is it a list? A dict? Those two points make me uncomfortable with "+=" strictly behaving like ".update()". 2019-03-04 17:44 UTC+01:00, Rhodri James <rhodri@kynesim.co.uk>:
On 04/03/2019 15:12, James Lu wrote:
On Mar 4, 2019, at 10:02 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
Stefan There is already an expression for key-overriding merge. Why do we need a new one?
Because the existing one is inobvious, hard to discover and ugly.
-- Rhodri James *-* Kynesim Ltd _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Mar 4, 2019 at 3:31 PM Del Gan <delgan.py@gmail.com> wrote:
the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs
I am a little surprised by this choice.
First, this means that "a += b" would not be equivalent to "a = a + b". Is there other built-in types which act differently if called with the operator or augmented assignment version?
Yes. The same happens for lists. [1] + 'a' is a TypeError, but a += 'a' works:
a = [1] a + 'a' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can only concatenate list (not "str") to list a += 'a' a [1, 'a']
Secondly, that would imply I would no longer be able to infer the type of "a" while reading "a += [('foo', 'bar')]". Is it a list? A dict?
Real code more likely looks like "a += b" and there you already don't have much of a clue -- the author of the code should probably communicate this using naming conventions or type annotations.
Those two points make me uncomfortable with "+=" strictly behaving like ".update()".
And yet that's how it works for lists. (Note that dict.update() still has capabilities beyond +=, since you can also invoke it with keyword args.) -- --Guido van Rossum (python.org/~guido)
On Mon, 4 Mar 2019 15:57:38 -0800 Guido van Rossum <guido@python.org> wrote:
Those two points make me uncomfortable with "+=" strictly behaving like ".update()".
And yet that's how it works for lists. (Note that dict.update() still has capabilities beyond +=, since you can also invoke it with keyword args.)
Yeah, well.... I do think "+=" for lists was a mistake. I *still* have trouble remembering the exact difference between "list +=" and "list.extend" (yes, there is one: one accepts more types than the other... which one it is, and why, I never remember; and, of course, there might be the obscure performance difference because of CPython's execution details). I should not have to remember whether I want to use "list +=" or "list.extend" every time I need to extend a list. There is a virtue to """There should be one-- and preferably only one --obvious way to do it""" and we shouldn't break it more than we already did. Regards Antoine.
On Fri, Mar 15, 2019 at 12:25:22PM +0100, Antoine Pitrou wrote:
Yeah, well.... I do think "+=" for lists was a mistake. I *still* have trouble remembering the exact difference between "list +=" and "list.extend" (yes, there is one: one accepts more types than the other... which one it is, and why, I never remember;
Both accept arbitrary iterables, and the documentation suggests that they are the same: https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types Perhaps you are thinking of the difference between list + list versus list += iterable? [...]
There is a virtue to
"""There should be one-- and preferably only one --obvious way to do it"""
"It" here refers to two different things: "I want to update a dict in place": The Obvious Way is to use the update method; the fact that += works as well is just a side-effect of the way augmented assignments are defined. "I want a new dict that merges two existing dicts": The Obvious Way is to use the merge operator (possibly spelled + but that's not written in stone yet). -- Steven
On Sat, 16 Mar 2019 01:59:07 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 15, 2019 at 12:25:22PM +0100, Antoine Pitrou wrote:
Yeah, well.... I do think "+=" for lists was a mistake. I *still* have trouble remembering the exact difference between "list +=" and "list.extend" (yes, there is one: one accepts more types than the other... which one it is, and why, I never remember;
Both accept arbitrary iterables, and the documentation suggests that they are the same:
https://docs.python.org/3/library/stdtypes.html#mutable-sequence-types
Perhaps you are thinking of the difference between list + list versus list += iterable?
Hmm, it looks like I misremembered indeed. Thanks for correcting this. Regards Antoine.
By the way, my “no same keys with different values” proposal would not apply to +=.
On Tue, Mar 5, 2019 at 12:02 AM Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
Stefan
It seems tautology and say nothing. What is "convenience of an expression"? Is it needed to make Python more readable language? Anyway, If "there is expression" is the main reason for this proposal, symbolic operator is not necessary. `new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python preferred name over symbol in general. Symbols are readable and understandable only when it has good math metaphor. Sets has symbol operator because it is well known in set in math, not because set is frequently used. In case of dict, there is no simple metaphor in math. It just cryptic and hard to Google. -- INADA Naoki <songofacandy@gmail.com>
This is mainly for Steve, as the author of PEP 584. I'm grateful to Steve for preparing the current draft. Thank you. It's strong on implementation, but I find it weak on motivation. I hope that when time is available you (and the other contributors) could transfer some motivating material into the PEP, from python-ideas. According to PEP 001, the PEP "should clearly explain why the existing language specification is inadequate to address the problem that the PEP solves". So it is important. -- Jonathan
INADA Naoki schrieb am 05.03.19 um 08:03:> On Tue, Mar 5, 2019 at 12:02 AM Stefan Behnel wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
It seems tautology and say nothing.
That's close to what I thought when I read your question. :)
What is "convenience of an expression"?
It's the convenience of being able to write an expression that generates the thing you need, rather than having to split code into statements that create it step by step before you can use it. Think of comprehensions versus for-loops. Comprehensions are expressions that don't add anything to the language that a for-loop cannot achieve. Still, everyone uses them because they are extremely convenient.
Is it needed to make Python more readable language?
No, just like comprehensions, it's not "needed". It's just convenient.
Anyway, If "there is expression" is the main reason for this proposal, symbolic operator is not necessary.
As said, "needed" is not the right word. Being able to use a decorator closes a gap in the language. Just like list comprehensions fit generator expressions and vice versa. There is no "need" for being able to write [x**2 for x in seq] {x**2 for x in seq} when you can equally well write list(x**2 for x in seq) set(x**2 for x in seq) But I certainly wouldn't complain about that redundancy in the language.
`new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python preferred name over symbol in general. Symbols are readable and understandable only when it has good math metaphor.
Sets has symbol operator because it is well known in set in math, not because set is frequently used.
In case of dict, there is no simple metaphor in math.
So then, if "list+list" and "tuple+tuple" wasn't available through an operator, would you also reject the idea of adding it, argueing that we could use this: L = L1.extended(L2) I honestly do not see the math relation in concatenation via "+". But, given that "+" and "|" already have the meaning of "merging two containers into one" in Python, I think it makes sense to allow that also for dicts.
It just cryptic and hard to Google.
I honestly doubt that it's something people would have to search for any more than they have to search for the list "+" operation. My guess is that it's pretty much what most people would try first when they have the need to merge two dicts, and only failing that, they would start a web search. In comparison, very few users would be able to come up with "{**d1, **d2}" on their own, or even "d1.updated(d2)". My point is, given the current language, "dict+dict" is a gap that is worth closing. Stefan
On Wed, Mar 6, 2019 at 5:34 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 05.03.19 um 08:03:> On Tue, Mar 5, 2019 at 12:02 AM Stefan Behnel wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression. It does not replace the convenience of an expression.
It seems tautology and say nothing.
That's close to what I thought when I read your question. :)
What is "convenience of an expression"?
It's the convenience of being able to write an expression that generates the thing you need, rather than having to split code into statements that create it step by step before you can use it.
I don't think it's reasonable rationale for adding operator. First, Python sometimes force people to use statement intentionally. Strictly speaking, dict.update() is an expression. But it not return `self` so you must split statements. It's design decision. So "add operator because I want expression" is bad reasoning to me. If it is valid reasoning, every mutating method should have operator. It's crazy idea. Second, operator is not required for expression. And adding operator must have high bar than adding method because it introduces more complexity and it could seen cryptic especially when the operator doesn't have good math metaphor. So I proposed adding dict.merge() instead of adding dict + as a counter proposal. If "I want expression" is really main motivation, it must be enough.
Think of comprehensions versus for-loops. Comprehensions are expressions that don't add anything to the language that a for-loop cannot achieve. Still, everyone uses them because they are extremely convenient.
I agree that comprehension is extremely convenient. But I think the main reason is it is compact and readable. If comprehension is not compact and readable as for-loop, it's not extremely convenient.
Is it needed to make Python more readable language?
No, just like comprehensions, it's not "needed". It's just convenient.
I think comprehension is needed to make Python more readable language, not just for convenient.
Anyway, If "there is expression" is the main reason for this proposal, symbolic operator is not necessary.
As said, "needed" is not the right word.
Maybe, I misunderstood nuance of the word "needed". English and Japanese are very different language. sorry.
Being able to use a decorator closes a gap in the language. Just like list comprehensions fit generator expressions and vice versa. There is no "need" for being able to write
[x**2 for x in seq] {x**2 for x in seq}
when you can equally well write
list(x**2 for x in seq) set(x**2 for x in seq)
But I certainly wouldn't complain about that redundancy in the language.
OK, I must agree this point. [] and {} has good metaphor in math. We use [1, 2, 3,... ] for series, and {1, 2, 3, ...} for sets.
`new = d1.updated(d2)` or `new = dict.merge(d1, d2)` are enough. Python preferred name over symbol in general. Symbols are readable and understandable only when it has good math metaphor.
Sets has symbol operator because it is well known in set in math, not because set is frequently used.
In case of dict, there is no simple metaphor in math.
So then, if "list+list" and "tuple+tuple" wasn't available through an operator, would you also reject the idea of adding it, argueing that we could use this:
L = L1.extended(L2)
I honestly do not see the math relation in concatenation via "+".
First of all, concatenating sequence (especially str) is extremely frequent than merging dict. My point is dict + dict is major abuse of + than seq + seq and it's usage is smaller than seq + seq. Let's describe why I think dict+dict is "major" abuse. As I said before, it's common to assign operator for concatenation in regular language, while middle-dot is used common. When the commonly-used operator is not in ASCII, other symbol can be used as alternative. We used | instead of ∪. In case of dict, it's not common to assign operator for merging in math, as far as I know. (Maybe, "direct sum" ⊕ is similar to it. But it doesn't allow intersection. So ValueError must be raised for duplicated key if we use "direct sum" for metaphor. But direct sum is higher-level math than "union" of set. I don't think it's good idea to use it as metaphor.) That's one of reasons I think seq + seq is "little" abuse and dict + dict is "major" abuse. Another reason is "throw some values away" doesn't fit mental model of "sum", as I said already in earlier mail.
But, given that "+" and "|" already have the meaning of "merging two containers into one" in Python, I think it makes sense to allow that also for dicts.
+ is used for concatenate, it is more strict than just merge. If + is allowed for dict, set should support it too for consistency. Then, meaning of "+ for container" become "sum up two containers in some way, defined by the container type." It's consistent. Kotlin uses + for this meaning. Scala uses ++ for this meaning. But this is a large design change of the language. Is this really required? I feel adding a method is enough. -- Inada Naoki <songofacandy@gmail.com>
On Mon, 4 Mar 2019 16:02:06 +0100 Stefan Behnel <stefan_ml@behnel.de> wrote:
INADA Naoki schrieb am 04.03.19 um 11:15:
Why statement is not enough?
I'm not sure I understand why you're asking this, but a statement is "not enough" because it's a statement and not an expression.
This is an argument for Perl 6, not for Python. Regards Antoine.
but requires either some obscure syntax or a statement instead of a simple expression.
The proposal is to enable the obvious syntax for something that should be obvious.
Stefan
The discussions on this list show that the behavior of `+` operator with dict will never be obvious (first wins or second wins or add results or raise Exception). So the user will always have to look at the doc or test it to know the intended behavior. That said, [1,2] + [3] equals [1,2,3] but not[1,2, [3]] and that was not obvious to me, and I survived.
On Mar 4, 2019, at 4:51 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Jimmy Girardet schrieb am 04.03.19 um 10:12:
I'm not old on this list but every time there is a proposal, the answer is "what are you trying to solve ?".
Since
|z ={**x,**y} and z.update(y) Exists, I can"t find the answer.
I think the main intentions is to close a gap in the language.
[1,2,3] + [4,5,6]
works for lists and tuples,
{1,2,3} | {4,5,6}
works for sets, but joining two dicts isn't simply
{1:2, 3:4} + {5:6}
but requires either some obscure syntax or a statement instead of a simple expression.
The proposal is to enable the obvious syntax for something that should be obvious.
Rebutting my “throw KeyError on conflicting keys for +” proposal: Indeed but + is never destructive in those contexts: duplicate list items are okay because they’re ordered, duplicated set items are okay because they mean the same thing (when two sets contain the same item and you merge the two the “containing” means the same thing), but duplicate dict keys mean different things. How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary? It’s better to have two different syntaxes for different situations. The KeyError of my proposal is a feature, a sign that something is wrong, a sign an invariant is being violated. Yes, {**, **} syntax looks abnormal and ugly. That’s part of the point– how many times have you needed to create a copy of a dictionary and update that dictionary with overriding keys from a new dictionary? It’s much more common to have non-conflicting keys. The ugliness of the syntax makes one pause and think and ask: “Why is it important that the keys from this dictionary override the ones from another dictionary?” PROPOSAL EDIT: I think KeyError should only be thrown if the same keys from two dictionaries have values that are not __eq__.
On Mon, Mar 04, 2019 at 10:09:32AM -0500, James Lu wrote:
How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary?
Very frequently. That's why we have a dict.update method, which if I remember correctly, was introduced in Python 1.5 because people were frequently re-inventing the same wheel: def update(d1, d2): for key in d2.keys(): d1[key] in d2[key] You should have a look at how many times it is used in the standard library: [steve@ando cpython]$ cd Lib/ [steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l 373 Now some of those are false positives (docstrings, comments, non-dicts, etc) but that still leaves a lot of examples of wanting to override old keys. This is a very common need. Wanting an exception if the key already exists is, as far as I can tell, very rare. It is true that many of the examples in the std lib involve updating an existing dict, not creating a new one. But that's only to be expected: since Python didn't provide an obvious functional version of update, only an in-place version, naturally people get used to writing in-place code. (Think about how long we made do without sorted(). I don't know about other people, but I now find sorted indispensible, and probably use it ten or twenty times more often than the in-place version.) [...]
The KeyError of my proposal is a feature, a sign that something is wrong, a sign an invariant is being violated.
Why is "keys are unique" an invariant? The PEP gives a good example of when this "invariant" would be unnecessarily restrictive: For example, updating default configuration values with user-supplied values would most often fail under the requirement that keys are unique:: prefs = site_defaults + user_defaults + document_prefs Another example would be when reading command line options, where the most common convention is for "last option seen" to win: [steve@ando Lib]$ grep --color=always --color=never "zero" f*.py fileinput.py: numbers are zero; nextfile() has no effect. fractions.py: # the same way for any finite a, so treat a as zero. functools.py: # prevent their ref counts from going to zero during and the output is printed without colour. (I've slightly edited the above output so it will fit in the email without wrapping.) The very name "update" should tell us that the most useful behaviour is the one the devs decided on back in 1.5: have the last seen value win. How can you update values if the operation raises an error if the key already exists? If this behaviour is ever useful, I would expect that it will be very rare. An update or merge is effectively just running through a loop setting the value of a key. See the pre-Python 1.5 function above. Having update raise an exception if the key already exists would be about as useful as having ``d[key] = value`` raise an exception if the key already exists. Unless someone can demonstrate that the design of dict.update() was a mistake, and the "require unique keys" behaviour is more common, then I maintain that for the very rare cases you want an exception, you can subclass dict and overload the __add__ method: # Intentionally simplified version. def __add__(self, other): if self.keys() & other.keys(): raise KeyError return super().__add__(self, other)
The ugliness of the syntax makes one pause and think and ask: “Why is it important that the keys from this dictionary override the ones from another dictionary?”
Because that is the most common and useful behaviour. That's what it means to *update* a dict or database, and this proposal is for an update operator. The ugliness of the existing syntax is not a feature, it is a barrier. -- Steven
On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <steve@pearwood.info> wrote:
How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary?
Very frequently.
That's why we have a dict.update method, which if I remember correctly, was introduced in Python 1.5 because people were frequently re-inventing the same wheel:
def update(d1, d2): for key in d2.keys(): d1[key] in d2[key]
You should have a look at how many times it is used in the standard library:
[steve@ando cpython]$ cd Lib/ [steve@ando Lib]$ grep -U "\.update[(]" *.py */*.py | wc -l 373
Now some of those are false positives (docstrings, comments, non-dicts, etc) but that still leaves a lot of examples of wanting to override old keys. This is a very common need. Wanting an exception if the key already exists is, as far as I can tell, very rare. It is very rare when you want to modify an existing dictionary. It’s not rare at all when you’re creating a new one.
It is true that many of the examples in the std lib involve updating an existing dict, not creating a new one. But that's only to be expected: since Python didn't provide an obvious functional version of update, only an in-place version, naturally people get used to writing in-place code. My question was “How many situations would you need to make a copy of a dictionary and then update that copy and override old keys from a new dictionary?” Try to really think about my question, instead of giving answering with half of it to dismiss my point.
On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <steve@pearwood.info> wrote:
The PEP gives a good example of when this "invariant" would be unnecessarily restrictive:
For example, updating default configuration values with user-supplied values would most often fail under the requirement that keys are unique::
prefs = site_defaults + user_defaults + document_prefs
Another example would be when reading command line options, where the most common convention is for "last option seen" to win:
[steve@ando Lib]$ grep --color=always --color=never "zero" f*.py fileinput.py: numbers are zero; nextfile() has no effect. fractions.py: # the same way for any finite a, so treat a as zero. functools.py: # prevent their ref counts from going to zero during
Indeed, in this case you would want to use {**, **} syntax.
and the output is printed without colour.
(I've slightly edited the above output so it will fit in the email without wrapping.)
The very name "update" should tell us that the most useful behaviour is the one the devs decided on back in 1.5: have the last seen value win. How can you update values if the operation raises an error if the key already exists? If this behaviour is ever useful, I would expect that it will be very rare.
An update or merge is effectively just running through a loop setting the value of a key. See the pre-Python 1.5 function above. Having update raise an exception if the key already exists would be about as useful as having ``d[key] = value`` raise an exception if the key already exists.
Unless someone can demonstrate that the design of dict.update() was a mistake You’re making a logical mistake here. + isn’t supposed to have .update’s behavior and it never was supposed to.
, and the "require unique keys" behaviour is more common, I just have. 99% of the time you want to have keys from one dict override another, you’d be better off doing it in-place and so would be using .update() anyways.
then I maintain that for the very rare cases you want an exception, you can subclass dict and overload the __add__ method: Well, yes, the whole point is to define the best default behavior.
On Mon, Mar 04, 2019 at 08:01:38PM -0500, James Lu wrote:
On Mar 4, 2019, at 11:25 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Another example would be when reading command line options, where the most common convention is for "last option seen" to win:
[steve@ando Lib]$ grep --color=always --color=never "zero" f*.py [...] Indeed, in this case you would want to use {**, **} syntax.
No I would NOT want to use the {**, **} syntax, because it is ugly. That's why people ask for + instead. (Or perhaps I should say "as well as" since the double-star syntax is not going away.) [...]
Unless someone can demonstrate that the design of dict.update() was a mistake
You’re making a logical mistake here. + isn’t supposed to have .update’s behavior and it never was supposed to.
James, I'm the author of the PEP, and for the purposes of the proposal, the + operator is supposed to do what I say it is supposed to do. You might be able to persuade me to change the PEP, if you have a sufficiently good argument, or you can write your own counter PEP making a different choice, but please don't tell me what I intended. I know what I intended, and it is for + to have the same last-key-wins behaviour as update. That's the behaviour which is most commonly requested in the various times this comes up.
, and the "require unique keys" behaviour is more common,
I just have.
No you haven't -- you have simply *declared* that it is more common, without giving any evidence for it.
99% of the time you want to have keys from one dict override another, you’d be better off doing it in-place and so would be using .update() anyways.
I don't know if it is "99% of the time" or 50% of the time or 5%, but this PEP is for the remaining times where we don't want in-place updates but we want a new dict. I use list.append or list.extend more often than list concatenation, but when I want a new list, list concatenation is very useful. This proposal is about those cases where we want a new dict. -- Steven
On Fri, Mar 8, 2019 at 11:25 AM João Matos <jcrmatos@gmail.com> wrote:
I've just read your PEP 585 draft and have some questions. When you say "
Like the merge operator and list concatenation, the difference operator requires both operands to be dicts, while the augmented version allows any iterable of keys.
d - {'spam', 'parrot'} Traceback (most recent call last): ... TypeError: cannot take the difference of dict and set
d -= {'spam', 'parrot'} print(d) {'eggs': 2, 'cheese': 'cheddar'}
d -= [('spam', 999)] print(d) {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
"
The option d -= {'spam', 'parrot'} where parrot does not exist in the d dict, will raise an exception (eg. KeyNotFound) or be silent?
The option d -= [('spam', 999)] should remove the pair from the dict, correct? But the print that follows still shows it there. It's a mistake or am I missing something?
My understanding is that: - (Q1) Attempting to discard a key not in the target of the augmented assignment would *not *raise a KeyError (or any Exception for that matter). This is analogous to how the - operator works on sets and is consistent with the pure python implementation towards the bottom of the PEP. - (Q2) This one got me as well while implementing the proposal in cpython, but there is a difference in what "part" of the RHS the operators "care about" if the RHS isn't a dict. The += operator expects 2-tuples and will treat them as (key, value) pairs. The -= operator doesn't attempt to unpack the RHS's elements as += does and expects keys. So d -= [('spam', 999)] treated the tuple as a *key *and attempted to discard it. IOW, d = { 'spam': 999, ('spam', 999): True } d -= [('spam', 999)] Would discard the *key* ('spam', 999) and corresponding value True. Which highlights a possibly surprising incongruence between the operators: d = {} update = [(1,1), (2,2), (3,3)] d += update d -= update assert d == {} # will raise, as d still has 3 items Similarly, d = {} update = {1:1, 2:2, 3:3} d += update.items() d -= update.items() assert d == {} # will raise, for the same reason d -= update.keys() assert d == {} # would pass without issue That being said I (personally) wouldn't consider it a deal-breaker and still would very much appreciate of the added functionality (regardless of the choice of operator). - Jim
On Mar 4, 2019, at 4:51 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I think the main intentions is to close a gap in the language.
[1,2,3] + [4,5,6]
works for lists and tuples,
{1,2,3} | {4,5,6}
works for sets, but joining two dicts isn't simply
{1:2, 3:4} + {5:6}
but requires either some obscure syntax or a statement instead of a simple expression.
The proposal is to enable the obvious syntax for something that should be obvious.
I would challenge that this dictionary merging is something that is obvious. The existing sequences are simple collections of values where a dictionary is a mapping of values. The difference between the two is akin to the difference between a mathematical array or set and a unary mapping function. There is a clear and obvious way to combine arrays and sets -- concatenation for arrays and union for sets. Combining mapping functions is less than obvious. "Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more mathematical view of programming language types that includes two distinct operations for combining dictionaries -- merge and recursive merge. For two input dictionaries D1 & D2 and the output dictionary O D1 merge D2 O is D1 with the of those keys of D2 that do not have keys in D1 D1 recursive-merge D2 For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k] and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k]. Note that neither of the cases is the same as:
O = D1.copy() O.update(D2)
So that gives us three different ways to combine dictionaries that are each sensible. The following example uses dictionaries from "Putting Metaclasses to Work":
d1 = { ... 'title': 'Structured Programming', ... 'authors': 'Dahl, Dijkstra, and Hoare', ... 'locations': { ... 'Dahl': 'University of Oslo', ... 'Dijkstra': 'University of Texas', ... 'Hoare': 'Oxford University', ... }, ... }
d2 = { ... 'publisher': 'Academic Press', ... 'locations': { ... 'North America': 'New York', ... 'Europe': 'London', ... }, ... }
o = d1.copy() o.update(d2) o {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
recursive_merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London', 'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
https://repl.it/@dave_shawley/PuttingMetaclassesToWork <https://repl.it/@dave_shawley/PuttingMetaclassesToWork>
IMO, having more than one obvious outcome means that we should refuse the temptation to guess. If we do, then the result is only obvious to a subset of users and will be a surprise to the others. It's also useful to note that I am having trouble coming up with another programming language that supports a "+" operator for map types. Does anyone have an example of another programming language that allows for addition of dictionaries/mappings? If so, what is the behavior there? - dave -- Any linter or project that treats PEP 8 as mandatory has *already* failed, as PEP 8 itself states that the rules can be broken as needed. - Paul Moore.
Does anyone have an example of another programming language that allows for addition of dictionaries/mappings?
kotlin does that (`to` means `:`) : fun main() { var a = mutableMapOf<String,Int>("a" to 1, "b" to 2) var b = mutableMapOf<String,Int>("c" to 1, "b" to 3) println(a) println(b) println(a + b) println(b + a) } {a=1, b=2} {c=1, b=3} {a=1, b=3, c=1} {c=1, b=2, a=1}
On Tue, Mar 05, 2019 at 08:11:29AM -0500, David Shawley wrote:
"Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more mathematical view of programming language types that includes two distinct operations for combining dictionaries -- merge and recursive merge.
For two input dictionaries D1 & D2 and the output dictionary O
D1 merge D2 O is D1 with the of those keys of D2 that do not have keys in D1
D1 recursive-merge D2 For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k] and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].
I'm afraid I cannot understand either of those algorithms as written. I suspect that you've left at least one word out of the first. Fortunately your example below is extremely clear, thank you. [...]
The following example uses dictionaries from "Putting Metaclasses to Work":
d1 = { ... 'title': 'Structured Programming', ... 'authors': 'Dahl, Dijkstra, and Hoare', ... 'locations': { ... 'Dahl': 'University of Oslo', ... 'Dijkstra': 'University of Texas', ... 'Hoare': 'Oxford University', ... }, ... }
d2 = { ... 'publisher': 'Academic Press', ... 'locations': { ... 'North America': 'New York', ... 'Europe': 'London', ... }, ... }
o = d1.copy() o.update(d2) o {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
Yes, that's the classic "update with last seen wins". That's what the PEP proposes as that seems to be the most frequently requested behaviour. It is also the only behaviour which has been deemed useful enough in nearly 30 years of Python's history to be added to dict as a method.
merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
That seems to be "update with first seen wins", which is easily done using ChainMap or the proposed dict difference operator: dict( ChainMap(d1, d2) ) # or d1 + (d2 - d1) or simply by swapping the order of the operands: d2 + d1 (These are not *identical* in effect, there are small differences with respect to key:value identity, and order of keys. But they ought to give *equal* results.) Personally, I don't think that behaviour is as useful as the first, but it is certainly a legitimate kind of merge. As far as I know, this has never been requested before. Perhaps it is too niche?
recursive_merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London', 'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
That's an interesting one. I'd write it something like this: def merge(a, b): new = a.copy() for key, value in b: if key not in a: # Add new keys. new[key] = value else: v = new[key] if isinstance(value, dict) and isinstance(v, dict): # If both values are dicts, merge them. new[key] = merge(v, value) else: # What to do if only one is a dict? # Or if neither is a dict? return new I've seen variants of this where duplicate keys are handled by building a list of the values: def merge(a, b): new = a.copy() for key, value in b: if key in a: v = new[key] if isinstance(v, list): v.append(value) else: new[key] = [v, value] ... or by concatenating values, or adding them (as Counter does), etc. We have subclasses and operator overloading, so you can implement whatever behaviour you like. The question is, is this behaviour useful enough and common enough to be built into dict itself?
IMO, having more than one obvious outcome means that we should refuse the temptation to guess.
We're not *guessing*. We're *chosing* which behaviour we want. Nobody says: When I print some strings, I can seperate them with spaces, or dots, or newlines, and print a newline at the end, or suppress the newline. Since all of these behaviours might be useful for somebody, we should not "guess" what the user wants. Therefore we should not have a print() function at all. The behaviour of print() is not a guess as to what the user wants. We offer a specific behaviour, and if the user is happy with that, then they can use print(), and if not, they can write their own. The same applies here: we're offering one specific behaviour that we think is the most important, and anyone who wants another can write their own. If people don't like my choice of what I think is the most important (copy-and-update, with last seen wins), they can argue for whichever alternative they like. If they make a convincing enough case, the PEP can change :-) James Lu has already tried to argue that the "raise on non-unique keys" is the best behaviour. I have disagreed with that, but if James makes a strong enough case for his idea, and it gains sufficient support, I could be persuaded to change my position. Or he can write a competing PEP and the Steering Council can decide between the two ideas.
If we do, then the result is only obvious to a subset of users and will be a surprise to the others.
Its only a surprise to those users who don't read the docs and make assumptions about behaviour based on their own wild guesses. We should get away from the idea that the only behaviours we can provide are those which are "obvious" (intuitive?) to people who guess what it means without reading the docs. It's great when a function's meaning can be guessed or inferred from a basic understanding of English: len(string) # assuming len is an abbreviation for length but that sets the bar impossibly high. We can't guess what these do, not with any precision: print(spam, eggs) # prints spaces between arguments or not? spam is eggs # that's another way of spelling == right? zip(spam, eggs) # what does it do if args aren't the same length? and who can guess what these do without reading the docs? property, classmethod, slice, enumerate, iter I don't think that Python is a worse language for having specified a meaning for these rather than leaving them out. The Zen's prohibition against guessing in the face of ambiguity does not mean that we must not add a feature to the language that requires the user to learn what it does first.
It's also useful to note that I am having trouble coming up with another programming language that supports a "+" operator for map types.
Does anyone have an example of another programming language that allows for addition of dictionaries/mappings?
If so, what is the behavior there?
An excellent example, but my browser just crashed and it's after 3am here so I'm going to take this opportunity to go to bed :-) -- Steven
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values. -- Greg
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this: foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used `updator` being a simple function like this one: def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key]) if key == "tags": return updated[key] + updator[key] if key in ["a", "b", "c"]: # Those return updated[key] return updator[key] There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key])
if key == "tags": return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense. len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists. The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules: 1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys(). 2. The values of dict2 take priority over the values of dict1. 3. When either operand is a set, it is treated as a dict whose values are None. This yields many useful operations and, most importantly, is simple to explain. "sets and dicts can |, &, -" takes up less space in your brain than "sets can |, &, - but dicts can only + and -, where dict + is like set |". merge and update some items: {'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4} pick some items: {'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3} remove some items: {'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1} reset values of some keys: {'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None} ensure certain keys are present: {'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None} pick some items: {'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2} remove some items: {'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1} On Wed, Mar 6, 2019 at 1:51 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key])
if key == "tags": return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 06/03/2019 10:29, Ka-Ping Yee wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
I'm sorry, but you're going to have to justify why this identity is important. Making assumptions about length where any dictionary manipulations are concerned seems unwise to me, which makes a nonsense of your claim that this is nonsense :-) -- Rhodri James *-* Kynesim Ltd
On Wed, Mar 6, 2019 at 11:52 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 06/03/2019 10:29, Ka-Ping Yee wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
I'm sorry, but you're going to have to justify why this identity is important. Making assumptions about length where any dictionary manipulations are concerned seems unwise to me, which makes a nonsense of your claim that this is nonsense :-)
It's not "nonsense" per se. If we were inventing programming languages in a vacuum, you could say + can mean "arbitrary combination operator" and it would be fine. But we're not in a vacuum; every major language that uses + with general purpose containers uses it to mean element-wise addition or concatenation, not just "merge". Concatenation is what imposes that identity (and all the others people are defending, like no loss of input values); you're taking a sequence of things, and shoving another sequence of things on the end of it, preserving order and all values. The argument here isn't that you *can't* make + do arbitrary merges that don't adhere to these semantics. It's that adding yet a third meaning to + (and it is a third meaning; it has no precedent in any existing type in Python, nor in any other major language; even in the minor languages that allow it, they use + for sets as well, so Python using + is making Python itself internally inconsistent with the operators used for set), for limited benefit. - Josh Rosenberg
Rhodri James wrote:
Making assumptions about length where any dictionary manipulations are concerned seems unwise to me
I think you're a bit hasty here. Some assumptions are sensible. Suppose a = len(d1) b = len(d2) c = len(d1 + d2) # Using the suggested syntax. Then we know max(a, b) <= c <= a + b And this is, in broad terms, characteristic of merge operations. -- Jonathan
Ka-Ping Yee wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists.
For me, this comment is excellent. It neatly expresses the central concern about this proposal. I think most us will agree that the proposal is to use '+' to express a merge operation, namely update. (There are other merge operations, when there are two values to combine, such as taking the min or max of the two values.) Certainly, many of the posts quite naturally use the word merge. Indeed PEP 584 writes "This PEP suggests adding merge '+' and difference '-' operators to the built-in dict class." We would all agree that it would be obviously wrong to suggest adding merge '-' and difference '+' operators. (Note: I've swapped '+' and '-'.) And why? Because it is obviously wrong to use '-' to denote merge, etc. Some of us are also upset by the use of '+' to denote merge. By the way, there is already a widespread symbol for merge. It appears on many road signs. It looks like an upside down 'Y'. It even has merge left and merge right versions. Python already has operator symbols '+', '-', '*', '/' and so on. See https://docs.python.org/3/reference/lexical_analysis.html#operators Perhaps we should add a merge or update symbol to this list, so that we don't overload to breaking point the humble '+' operator. Although that would make Python a bit more like APL. By the way, Pandas already has a merge operation, called merge, that takes many parameters. I've only glanced at it. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.... -- Jonathan
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap). I think growing the full collection of set operations world be a pleasant addition to dicts. I think shoe-horning in plus would always be jarring to me. On Wed, Mar 6, 2019, 5:30 AM Ka-Ping Yee <zestyping@gmail.com> wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists.
The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:
1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().
2. The values of dict2 take priority over the values of dict1.
3. When either operand is a set, it is treated as a dict whose values are None.
This yields many useful operations and, most importantly, is simple to explain. "sets and dicts can |, &, -" takes up less space in your brain than "sets can |, &, - but dicts can only + and -, where dict + is like set |".
merge and update some items:
{'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}
pick some items:
{'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}
remove some items:
{'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}
reset values of some keys:
{'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}
ensure certain keys are present:
{'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}
pick some items:
{'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}
remove some items:
{'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}
On Wed, Mar 6, 2019 at 1:51 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key])
if key == "tags": return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).
I think the behavior proposed in the PEP makes sense whether you think of "+" as meaning "concatenation" or "merging". If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict. But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP. This also makes explaining the behavior of "d1 + d2" slightly easier than explaining "d1 | d2". For the former, you can just say "d1 + d2 means we concat the two dicts together" and stop there. You almost don't need to explain the merging/right-most key wins behavior at all, since that behavior is the only one consistent with the existing language rules. In contrast, you *would* need to explain this with "d1 | d2": I would mentally translate this expression to mean "take the union of these two dicts" and there's no real way to deduce which key-value pair ends up in the final dict given that framing. Why is it that key-value pairs in d2 win over pairs in d1 here? That choice seems pretty arbitrary when you think of this operation in terms of unions, rather than either concat or merge. Using "|" would also violate an important existing property of unions: the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)". Personally, I don't really have a strong opinion on this PEP, or the other one I've seen proposed where we add a "d1.merge(d2, d3, ...)". But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild. -- Michael On Wed, Mar 6, 2019 at 4:53 AM David Mertz <mertz@gnosis.cx> wrote:
I strongly agree with Ka-Ping. '+' is intuitively concatenation not merging. The behavior is overwhelmingly more similar to the '|' operator in sets (whether or not a user happens to know the historical implementation overlap).
I think growing the full collection of set operations world be a pleasant addition to dicts. I think shoe-horning in plus would always be jarring to me.
On Wed, Mar 6, 2019, 5:30 AM Ka-Ping Yee <zestyping@gmail.com> wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
len(dict1 + dict2) cannot even be computed by any expression involving +. Using len() to test the semantics of the operation is not arbitrary; the fact that the sizes do not add is a defining quality of a merge. This is a merge, not an addition. The proper analogy is to sets, not lists.
The operators should be |, &, and -, exactly as for sets, and the behaviour defined with just three rules:
1. The keys of dict1 [op] dict2 are the elements of dict1.keys() [op] dict2.keys().
2. The values of dict2 take priority over the values of dict1.
3. When either operand is a set, it is treated as a dict whose values are None.
This yields many useful operations and, most importantly, is simple to explain. "sets and dicts can |, &, -" takes up less space in your brain than "sets can |, &, - but dicts can only + and -, where dict + is like set |".
merge and update some items:
{'a': 1, 'b': 2} | {'b': 3, 'c': 4} => {'a': 1, 'b': 3, 'c': 4}
pick some items:
{'a': 1, 'b': 2} & {'b': 3, 'c': 4} => {'b': 3}
remove some items:
{'a': 1, 'b': 2} - {'b': 3, 'c': 4} => {'a': 1}
reset values of some keys:
{'a': 1, 'b': 2} | {'b', 'c'} => {'a': 1, 'b': None, 'c': None}
ensure certain keys are present:
{'b', 'c'} | {'a': 1, 'b': 2} => {'a': 1, 'b': 2, 'c': None}
pick some items:
{'b', 'c'} | {'a': 1, 'b': 2} => {'b': 2}
remove some items:
{'a': 1, 'b': 2} - {'b', 'c'} => {'a': 1}
On Wed, Mar 6, 2019 at 1:51 AM Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
Le 6 mars 2019 à 10:26:15, Brice Parent (contact@brice.xyz(mailto:contact@brice.xyz)) a écrit:
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself?
I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used
This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>}
`updator` being a simple function like this one:
def updator(updated, updator, key) -> Any: if key == "related": return updated[key].update(updator[key])
if key == "tags": return updated[key] + updator[key]
if key in ["a", "b", "c"]: # Those return updated[key]
return updator[key]
There's nothing here that couldn't be made today by using a custom update function, but leaving the burden of checking for values that are in both and actually inserting the new values to Python's language, and keeping on our side only the parts that are specific to our use case, makes in my opinion the code more readable, with fewer possible bugs and possibly better optimization.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Mar 7, 2019 at 12:59 AM Michael Lee <michael.lee.0x2a@gmail.com> wrote:
If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP.
Which, by the way, is also consistent with assignment: d = {}; d["a"] = 1; d["b"] = 2; d["c"] = 3; d["b"] = 4 Rightmost one wins. It's the most logical behaviour. ChrisA
Michael Lee wrote:
If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
But of course, you can't have duplicate keys in Python. So, you would either recall or look up how duplicate keys are handled when constructing a dict and learn that the rule is that the right-most key wins. So the natural conclusion is that "+" would follow this existing rule -- and you end up with exactly the behavior described in the PEP.
This is a nice argument. And well presented. And it gave me surprise, that taught me something. Here goes: >>> {'a': 0} {'a': 0} >>> {'a': 0, 'a': 0} {'a': 0} >>> {'a': 0, 'a': 1} {'a': 1} >>> {'a': 1, 'a': 0} {'a': 0} This surprised me quite a bit. I was expecting to get an exception. However >>> dict(a=0) {'a': 0} >>> dict(a=0, a=0) SyntaxError: keyword argument repeated does give an exception. I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested) anywhere? I didn't find it in these URLs: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict https://docs.python.org/3/tutorial/datastructures.html#dictionaries I think this behaviour might give rise to gotchas. For example, if we define inverse_f by >>> inverse_f = { f(a): a, f(b): b } then is the next statement always true (assuming a <> b)? >>> inverse_f[ f(a) ] == a Well, it's not true with these values >>> a, b = 1, 2 >>> def f(n): pass # There's a bug here, f(n) should be a bijection. A quick check that len(inverse) == 2 would provide a sanity check. Or perhaps better, len(inverse_f) == len(set(a, b)). (I don't have an example of this bug appearing 'in the wild'.) Once again, I thank Michael for his nice, instructive and well-presented example. -- Jonathan
SUMMARY: The outcome of a search for: python dict literal duplicate keys. No conclusions (so far). BACKGROUND In the thread "PEP: Dict addition and subtraction" I wrote
>>> {'a': 0, 'a': 1} {'a': 1}
I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested) anywhere? I didn't find it in these URLs: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict https://docs.python.org/3/tutorial/datastructures.html#dictionaries
LINKS I've since found some relevant URLs. [1] https://stackoverflow.com/questions/34539772/is-a-dict-literal-containing-re... [2] https://help.semmle.com/wiki/display/PYTHON/Duplicate+key+in+dict+literal [3] https://bugs.python.org/issue26910 [4] https://bugs.python.org/issue16385 [5] https://realpython.com/python-dicts/ ANALYSIS [1] gives a reference to [6], which correctly states the behaviour of {'a':0, 'a':1}, although without giving an example. (Aside: Sometimes one example is worth 50 or more words.) [2] is from Semmle, who provide an automated code review tool, called LGTM. The page [2] appears to be part of the documentation for LGTM. This page provides a useful link to [7]. [3] is a re-opening of [4]. It was rapidly closed by David Murray, who recommended reopening the discussion on python-ideas. [4] was raised by Albert Ferras, based on his real-world experience. In particular, a configuration file that contains a long dict literal. This was closed by Benjamin Peterson, who said that raising an error was "out of the question for compatibility isssues". Given few use case and little support on python-ideas,Terry Ready supported the closure. Raymond Hettinger supported the closure. [5] is from RealPython, who provide online tutorials. This page contains the statement "a given key can appear in a dictionary only once. Duplicate keys are not allowed." Note that {'a': 0, 'a': 1} can reasonably be thought of as a dictionary with duplicate keys. NOTE As I recall SGML (this shows my age) allows multiple entity declarations, as in <!ENTITY key "original"> <!ENTITY key "updated"> And as I recall, in SGML the first value "original" is the one that is in effect. This is what happens with the LaTeX command \providecommand. FURTHER LINKS [6] https://docs.python.org/3/reference/expressions.html#dictionary-displays [7] https://cwe.mitre.org/data/definitions/561.html # CWE-561: Dead Code -- Jonathan
https://docs.python.org/3/reference/expressions.html#dictionary-displays
If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.
On Thu, Mar 7, 2019 at 2:09 AM Jonathan Fine <jfine2358@gmail.com> wrote:
SUMMARY: The outcome of a search for: python dict literal duplicate keys. No conclusions (so far).
BACKGROUND In the thread "PEP: Dict addition and subtraction" I wrote
>>> {'a': 0, 'a': 1} {'a': 1}
I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested) anywhere? I didn't find it in these URLs: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict https://docs.python.org/3/tutorial/datastructures.html#dictionaries
LINKS I've since found some relevant URLs.
[1] https://stackoverflow.com/questions/34539772/is-a-dict-literal-containing-re... [2] https://help.semmle.com/wiki/display/PYTHON/Duplicate+key+in+dict+literal [3] https://bugs.python.org/issue26910 [4] https://bugs.python.org/issue16385 [5] https://realpython.com/python-dicts/
ANALYSIS [1] gives a reference to [6], which correctly states the behaviour of {'a':0, 'a':1}, although without giving an example. (Aside: Sometimes one example is worth 50 or more words.)
[2] is from Semmle, who provide an automated code review tool, called LGTM. The page [2] appears to be part of the documentation for LGTM. This page provides a useful link to [7].
[3] is a re-opening of [4]. It was rapidly closed by David Murray, who recommended reopening the discussion on python-ideas. [4] was raised by Albert Ferras, based on his real-world experience. In particular, a configuration file that contains a long dict literal. This was closed by Benjamin Peterson, who said that raising an error was "out of the question for compatibility isssues". Given few use case and little support on python-ideas,Terry Ready supported the closure. Raymond Hettinger supported the closure.
[5] is from RealPython, who provide online tutorials. This page contains the statement "a given key can appear in a dictionary only once. Duplicate keys are not allowed." Note that {'a': 0, 'a': 1} can reasonably be thought of as a dictionary with duplicate keys.
NOTE As I recall SGML (this shows my age) allows multiple entity declarations, as in <!ENTITY key "original"> <!ENTITY key "updated">
And as I recall, in SGML the first value "original" is the one that is in effect. This is what happens with the LaTeX command \providecommand.
FURTHER LINKS [6] https://docs.python.org/3/reference/expressions.html#dictionary-displays [7] https://cwe.mitre.org/data/definitions/561.html # CWE-561: Dead Code
-- Jonathan _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
SUMMARY: Off-thread-topic comment on examples and words in documentation. Inada Naoki quoted (from doc.python ref [6] in my original post):
If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given.
Indeed. Although off-topic, I think
{'a': 0, 'a': 1} == {'a': 1} True
is much better than "This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given." By the way, today I think we'd say key/value pairs. And I've read https://www.theguardian.com/guardian-observer-style-guide-d data takes a singular verb (like agenda), though strictly a plural; you come across datum, the singular of data, about as often as you hear about an agendum Oh, and "the final dictionary's value" should I think be "the dictionary's final value" or perhaps just "the dictionary's value" But now we're far from the thread topic. I'm happy to join in on a thread on improving documentation (by using simpler language and good examples). -- Jonathan
On 06/03/2019 17:43, Jonathan Fine wrote:
Indeed. Although off-topic, I think
{'a': 0, 'a': 1} == {'a': 1} True
is much better than "This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given."
I disagree. An example is an excellent thing, but the words are definitive and must be there. -- Rhodri James *-* Kynesim Ltd
On 06/03/2019 18:12, Rhodri James wrote:
On 06/03/2019 17:43, Jonathan Fine wrote:
Indeed. Although off-topic, I think
{'a': 0, 'a': 1} == {'a': 1} True
is much better than "This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given."
I disagree. An example is an excellent thing, but the words are definitive and must be there.
Sigh. I hit SEND before I finished changing the title. Sorry, folks. -- Rhodri James *-* Kynesim Ltd
Would it shut down this particular subthread if (as the language's designer, if not its BDFL) I declared that this was an explicit design decision that I made nearly 30 years ago? I should perhaps blog about the background of this decision, but it was quite a conscious one. There really is no point in thinking that this is an accident of implementation or could be changed. On Wed, Mar 6, 2019 at 9:10 AM Jonathan Fine <jfine2358@gmail.com> wrote:
SUMMARY: The outcome of a search for: python dict literal duplicate keys. No conclusions (so far).
BACKGROUND In the thread "PEP: Dict addition and subtraction" I wrote
>>> {'a': 0, 'a': 1} {'a': 1}
I wonder, is this behaviour of {'a': 0, 'a': 1} documented (or tested) anywhere? I didn't find it in these URLs: https://docs.python.org/3/library/stdtypes.html#mapping-types-dict https://docs.python.org/3/tutorial/datastructures.html#dictionaries
LINKS I've since found some relevant URLs.
[1] https://stackoverflow.com/questions/34539772/is-a-dict-literal-containing-re... [2] https://help.semmle.com/wiki/display/PYTHON/Duplicate+key+in+dict+literal [3] https://bugs.python.org/issue26910 [4] https://bugs.python.org/issue16385 [5] https://realpython.com/python-dicts/
ANALYSIS [1] gives a reference to [6], which correctly states the behaviour of {'a':0, 'a':1}, although without giving an example. (Aside: Sometimes one example is worth 50 or more words.)
[2] is from Semmle, who provide an automated code review tool, called LGTM. The page [2] appears to be part of the documentation for LGTM. This page provides a useful link to [7].
[3] is a re-opening of [4]. It was rapidly closed by David Murray, who recommended reopening the discussion on python-ideas. [4] was raised by Albert Ferras, based on his real-world experience. In particular, a configuration file that contains a long dict literal. This was closed by Benjamin Peterson, who said that raising an error was "out of the question for compatibility isssues". Given few use case and little support on python-ideas,Terry Ready supported the closure. Raymond Hettinger supported the closure.
[5] is from RealPython, who provide online tutorials. This page contains the statement "a given key can appear in a dictionary only once. Duplicate keys are not allowed." Note that {'a': 0, 'a': 1} can reasonably be thought of as a dictionary with duplicate keys.
NOTE As I recall SGML (this shows my age) allows multiple entity declarations, as in <!ENTITY key "original"> <!ENTITY key "updated">
And as I recall, in SGML the first value "original" is the one that is in effect. This is what happens with the LaTeX command \providecommand.
FURTHER LINKS [6] https://docs.python.org/3/reference/expressions.html#dictionary-displays [7] https://cwe.mitre.org/data/definitions/561.html # CWE-561: Dead Code
-- Jonathan _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Hi Guido You wrote:
Would it shut down this particular subthread if (as the language's designer, if not its BDFL) I declared that this was an explicit design decision that I made nearly 30 years ago? I should perhaps blog about the background of this decision, but it was quite a conscious one. There really is no point in thinking that this is an accident of implementation or could be changed.
Thank you for sharing this with us. I'd be fascinated to hear about the background to this conscious decision, and I think it would help me and others understand better what makes Python what it is. And it might help persuade me that my surprise at {'a': 0, 'a': 1} is misplaced, or at least exaggerated and one-sided. Do you want menial help writing the blog? Perhaps if you share your recollections, others will find the traces in the source code. For example, I've found the first dictobject.c, dating back to 1994. https://github.com/python/cpython/blob/956640880da20c20d5320477a0dcaf2026bd9... I'm a great fan of your Python conversation (with Biancuzzi and Warden) in http://shop.oreilly.com/product/9780596515171.do # Masterminds of Programming I've read this article several times, and have wished that it was more widely available. My personal view is that putting a copy of this article in docs.python.org would provide more benefit to the community than you blogging on why dict literals allow duplicate keys. However, it need not be either/or. Perhaps someone could ask the PSF to talk with O'Reilly about getting copyright clearance to do this. Finally, some personal remarks. I've got a long training as a pure mathematician. For me consistency and application of simple basic principles is important to me. And also the discovery of basic principles. In your interview, you say (paraphrased and without context) that most Python code is written simply to get a job done. And that pragmatism, rather then being hung up about theoretical concept, is the fundamental quality in being proficient in developing with Python. Thank you for inventing Python, and designing the language. It's a language popular both with pure mathematicians, and also pragmatic people who want to get things done. That's quite an achievement, which has drawn people like me into your community. with best regards Jonathan
On Wed, Mar 6, 2019 at 10:59 PM Michael Lee <michael.lee.0x2a@gmail.com> wrote:
I think the behavior proposed in the PEP makes sense whether you think of "+" as meaning "concatenation" or "merging".
If your instinct is to assume "+" means "concatenation", then it would be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
Nice explanation. You reduced my opposite to `+` by "literally concat". Better example, {"a": 1, "b": 2} + {"c": 4, "b": 3} == {"a": 1, "b": 2, "c": 4, "b": 3} == {"a": 1, "b": 3, "c": 4} On the other hand, union of set is also "literally concat". If we use this "literally concat" metaphor, I still think set should have `+` as alias to `|` for consistency.
Using "|" would also violate an important existing property of unions: the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)".
I think both rule are "rather a coincidence than a conscious decision". I think "|" keeps commutativity only because it's minor than `+`. Easy operator is abused easily more than minor operator. And I think every "coincidence" rules are important. They makes understanding Python easy. Every people "discover" rules and consistency while learning language. This is a matter of balance. There are no right answer. Someone *feel* rule A is important than B. Someone feel opposite.
But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild.
Hmm. The PEP proposed dict - dict, which is similar to set - set (difference). To me, {"a": 1, "b": 2} - {"b": 3} = {"a": 1} is confusing than {"a": 1, "b": 2} - {"b"} = {"a": 1}. So I think borrow some semantics from set is good idea. Both of `dict - set` and `dict & set` makes sense to me. * `dict - set` can be used to remove private keys by "blacklist". * `dict & set` can be used to choose public keys by "whiltelist". -- Inada Naoki <songofacandy@gmail.com>
If we use this "literally concat" metaphor, I still think set should have `+` as alias to `|` for consistency.
I agree. I think "|" keeps commutativity only because it's minor than `+`.
I suppose that's true, fair point. I guess I would be ok with | no longer always implying commutativity if we were repurposing it for some radically different purpose. But dicts and sets are similar enough that I think having them both use similar but ultimately different definitions of "|" is going to have non-zero cost, especially when reading or modifying future code that makes heavy use of both data structures. Maybe that cost is worth it. I'm personally not convinced, but I do think it should be taken into account.. Hmm. The PEP proposed dict - dict, which is similar to set - set
(difference).
Now that you point it out, I think I also dislike `d1 - d2` for the same reasons I listed earlier: it's not consistent with set semantics. One other objection I overlooked is that the PEP currently requires both operands to be dicts when doing "d1 - d2" . So doing {"a": 1, "b": 2, "c": 3} - ["a", "b"] is currently disallowed (though doing d1 -= ["a", "b"] is apparently ok). I can sympathize: allowing "d1 - some_iter" feels a little too magical to me. But it's unfortunately restrictive -- I suspect removing keys stored within a list or something would be just as common of a use-case if not more so then removing keys stored in another dict. I propose that we instead add methods like "d1.without_keys(...)" and "d1.remove_keys(...)" that can accept any iterable of keys. These two methods would replace "d1.__sub__(...)" and "d1.__isub__(...)" respectively. The exact method names and semantics could probably do with a little more bikeshedding, but I think this idea would remove a false symmetry between "d1 + d2" and "d1 - d2" that doesn't actually really exist while being more broadly useful. Or I guess we could just remove that restriction: "it feels too magical" isn't a great objection on my part. Either way, that part of the PEP could use some more refinement, I think. -- Michael On Wed, Mar 6, 2019 at 8:29 AM Inada Naoki <songofacandy@gmail.com> wrote:
On Wed, Mar 6, 2019 at 10:59 PM Michael Lee <michael.lee.0x2a@gmail.com> wrote:
I think the behavior proposed in the PEP makes sense whether you think
of "+" as meaning "concatenation" or "merging".
If your instinct is to assume "+" means "concatenation", then it would
be natural to assume that {"a": 1, "b": 2} + {"c": 3, "b": 4} would be identical to {"a": 1, "b": 2, "c": 3, "b": 4} -- literally concat the key-value pairs into a new dict.
Nice explanation. You reduced my opposite to `+` by "literally concat". Better example, {"a": 1, "b": 2} + {"c": 4, "b": 3} == {"a": 1, "b": 2, "c": 4, "b": 3} == {"a": 1, "b": 3, "c": 4}
On the other hand, union of set is also "literally concat". If we use this "literally concat" metaphor, I still think set should have `+` as alias to `|` for consistency.
Using "|" would also violate an important existing property of unions:
the invariant "d1 | d2 == d2 | d1" is no longer true. As far as I'm aware, the union operation is always taken to be commutative in math, and so I think it's important that we preserve that property in Python. At the very least, I think it's far more important to preserve commutativity of unions then it is to preserve some of the invariants I've seen proposed above, like "len(d1 + d2) == len(d1) + len(d2)".
I think both rule are "rather a coincidence than a conscious decision".
I think "|" keeps commutativity only because it's minor than `+`. Easy operator is abused easily more than minor operator.
And I think every "coincidence" rules are important. They makes understanding Python easy. Every people "discover" rules and consistency while learning language.
This is a matter of balance. There are no right answer. Someone *feel* rule A is important than B. Someone feel opposite.
But I do know that I'm a strong -1 on adding set operations to dicts: it's not possible to preserve the existing semantics of union (and intersection) with dict and think expressions like "d1 | d2" and "d1 & d2" would just be confusing and misleading to encounter in the wild.
Hmm. The PEP proposed dict - dict, which is similar to set - set (difference). To me, {"a": 1, "b": 2} - {"b": 3} = {"a": 1} is confusing than {"a": 1, "b": 2} - {"b"} = {"a": 1}.
So I think borrow some semantics from set is good idea. Both of `dict - set` and `dict & set` makes sense to me.
* `dict - set` can be used to remove private keys by "blacklist". * `dict & set` can be used to choose public keys by "whiltelist".
-- Inada Naoki <songofacandy@gmail.com>
Ka-Ping Yee wrote:
len(dict1 + dict2) does not equal len(dict1) + len(dict2), so using the + operator is nonsense.
You might as well say that using the + operator on vectors is nonsense, because len(v1 + v2) is not in general equal to len(v1) + len(v2). Yet mathematicians are quite happy to talk about "addition" of vectors. -- Greg
On Wed, Mar 6, 2019 at 10:31 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
You might as well say that using the + operator on vectors is nonsense, because len(v1 + v2) is not in general equal to len(v1) + len(v2).
Yet mathematicians are quite happy to talk about "addition" of vectors.
Vectors addition is *actual* addition, not concatenation. You're so busy loosening the definition of + as relates to , to make it make sense for dicts that you've forgotten that + is, first and foremost, about addition in the mathematical sense, where vector addition is just one type of addition. Concatenation is already a minor abuse of +, but one commonly accepted by programmers, thanks to it having some similarities to addition and a single, unambiguous set of semantics to avoid confusion. You're defending + on dicts because vector addition isn't concatenation already, which only shows how muddled things get when you try to use + to mean multiple concepts that are at best loosely related. The closest I can come to a thorough definition of what + does in Python (and most languages) right now is that: 1. Returns a new thing of the same type (or a shared coerced type for number weirdness) 2. That combines the information of the input operands 3. Is associative ((a + b) + c produces the same thing as a + (b + c)) (modulo floating point weirdness) 4. Is "reversible": Knowing the end result and *one* of the inputs is sufficient to determine the value of the other input; that is, for c = a + b, knowing any two of a, b and c allows you to determine a single unambiguous value for the remaining value (numeric coercion and floating point weirdness make this not 100%, but you can at least know a value equal to other value; e.g. for c = a + b, knowing c is 5.0 and a is 1.0 is sufficient to say that b is equal to 4, even if it's not necessarily an int or float). For numbers, reversal is done with -; for sequences, it's done by slicing c using the length of a or b to "subtract" the elements that came from a/b. 5. (Actual addition only) Is commutative (modulo floating point weirdness); a + b == b + a 6. (Concatenation only) Is order preserving (really a natural consequence of #4, but a property that people expect) Note that these rules are consistent across most major languages that allow + to mean combine collections (the few that disagree, like Pascal, don't support | as a union operator). Concatenation is missing element #5, but otherwise aligns with actual addition. dict merges (and set unions for that matter) violate #4 and #6; for c = a + b, knowing c and either a or b still leaves a literally infinite set of possible inputs for the other input (it's not infinite for sets, where the options would be a subset of the result, but for dicts, there would be no such limitation; keys from b could exist with any possible value in a). dicts order preserving aspect *almost* satisfies #6, but not quite (if 'x' comes after 'y' in b, there is no guarantee that it will do so in c, because a gets first say on ordering, and b gets the final word on value). Allowing dicts to get involved in + means: 1. Fewer consistent rules apply to +; 2. The particular idiosyncrasies of Python dict ordering and "which value wins" rules are now tied to +. for concatenation, there is only one set of possible rules AFAICT so every language naturally agrees on behavior, but dict merging obviously has many possible rules that would be unlikely to match the exact rules of any other language except by coincidence). a winning on order and b winning on value is a historical artifact of how Python's dict developed; I doubt any other language would intentionally choose to split responsibility like that if they weren't handcuffed by history. Again, there's nothing wrong with making dict merges easier. But it shouldn't be done by (further) abusing +. -Josh Rosenberg
On Thu, Mar 7, 2019 at 10:52 AM Josh Rosenberg <shadowranger+pythonideas@gmail.com> wrote:
The closest I can come to a thorough definition of what + does in Python (and most languages) right now is that:
1. Returns a new thing of the same type (or a shared coerced type for number weirdness) 2. That combines the information of the input operands 3. Is associative ((a + b) + c produces the same thing as a + (b + c)) (modulo floating point weirdness) 4. Is "reversible": Knowing the end result and *one* of the inputs is sufficient to determine the value of the other input; that is, for c = a + b, knowing any two of a, b and c allows you to determine a single unambiguous value for the remaining value (numeric coercion and floating point weirdness make this not 100%, but you can at least know a value equal to other value; e.g. for c = a + b, knowing c is 5.0 and a is 1.0 is sufficient to say that b is equal to 4, even if it's not necessarily an int or float). For numbers, reversal is done with -; for sequences, it's done by slicing c using the length of a or b to "subtract" the elements that came from a/b. 5. (Actual addition only) Is commutative (modulo floating point weirdness); a + b == b + a 6. (Concatenation only) Is order preserving (really a natural consequence of #4, but a property that people expect)
Allowing dicts to get involved in + means:
1. Fewer consistent rules apply to +; 2. The particular idiosyncrasies of Python dict ordering and "which value wins" rules are now tied to +. for concatenation, there is only one set of possible rules AFAICT so every language naturally agrees on behavior, but dict merging obviously has many possible rules that would be unlikely to match the exact rules of any other language except by coincidence). a winning on order and b winning on value is a historical artifact of how Python's dict developed; I doubt any other language would intentionally choose to split responsibility like that if they weren't handcuffed by history.
Again, there's nothing wrong with making dict merges easier. But it shouldn't be done by (further) abusing +.
Lots of words that basically say: Stuff wouldn't be perfectly pure. But adding dictionaries is fundamentally *useful*. It is expressive. It will, in pretty much all situations, do exactly what someone would expect, based on knowledge of how Python works in other areas. The semantics for edge cases have to be clearly defined, but they'll only come into play on rare occasions; most of the time, for instance, we don't have to worry about identity vs equality in dictionary keys. If you tell people "adding two dictionaries combines them, with the right operand winning collisions", it won't matter that this isn't how lists or floats work; it'll be incredibly useful as it is. Practicality. Let's have some. ChrisA
On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Mar 7, 2019 at 10:52 AM Josh Rosenberg <shadowranger+pythonideas@gmail.com> wrote:
Allowing dicts to get involved in + means:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
But adding dictionaries is fundamentally *useful*. It is expressive.
It is useful. It's just that + is the wrong name. Filtering and subtracting from dictionaries are also useful! Those are operations we do all the time. It would be useful if & and - did these things too—and if we have & and -, it's going to be even more obvious that the merge operator should have been |. Josh Rosenberg <shadowranger+pythonideas@gmail.com> wrote:
If we were inventing programming languages in a vacuum, you could say + can mean "arbitrary combination operator" and it would be fine. But we're not in a vacuum; every major language that uses + with general purpose containers uses it to mean element-wise addition or concatenation, not just "merge".
If we were inventing Python from scratch, we could have decided that we always use "+" to combine collections. Sets would combine with + and then it would make sense that dictionaries also combine with + . But that is not Python. Lists combine with + and sets combine with |. Why? Because lists add (put both collections together and keep everything), but sets merge (put both collections together and keep some). So, Python already has a merge operator. The merge operator is "|". For lists, += is shorthand for list.extend(). For sets, |= is shorthand for set.update(). Is dictionary merge more like extend() or more like update()? Python already took a position on that when it was decided to name the dictionary method update(). That ship sailed a long time ago. —Ping
Now, this belongs as a separate PEP, and I probably will write one, but I propose: d1 << d2 makes a copy of d1 and merges d2 into it, and when the keys conflict, d2 takes priority. (Works like copy/update.) d1 + d2 makes a new dictionary, taking keys from d1 and d2. If d1 and d2 have a different value for same key, a KeyError is thrown.
Ka-Ping Yee writes:
On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <rosuav@gmail.com> wrote:
But adding dictionaries is fundamentally *useful*. It is expressive.
It is useful. It's just that + is the wrong name.
First, let me say that I prefer ?!'s position here, so my bias is made apparent. I'm also aware that I have biases so I'm sympathetic to those who take a different position. Rather than say it's "wrong", let me instead point out that I think it's pragmatically troublesome to use "+". I can think of at least four interpretations of "d1 + d2" 1. update 2. multiset (~= Collections.Counter addition) 3. addition of functions into the same vector space (actually, a semigroup will do ;-), and this is the implementation of Collections.Counter 4. "fiberwise" addition (ie, assembling functions into relations) and I'm very jet-lagged so I may be missing some. Since "|" (especially "|=") *is* suitable for "update", I think we should reserve "+" for some alternative future commutative extension, of which there are several possible (all of 2, 3, 4 are commutative). Again in the spirit of full disclosure, of those above, 2 is already implemented and widely used, so we don't need to use "+" for that. I've never seen 4 except in the mathematical literature (union of relations is not the same thing). 3, however, is very common both for mappings with small domain and sparse representation of mappings with a default value (possibly computed then cached), and "|" is not suitable for expressing that sort of addition (I'm willing to say it's "wrong" :-). There's also the fact that the operations denoted by "|" and "||" are often implemented as "short-circuiting", and therefore not commutative, while "+" usually is (and that's reinforced for mathematicians who are trained to think of "+" as the operator for Abelian groups, while "*" is a (possibly) non-commutative operator. I know commutativity of "+" has been mentioned before, but the non-commutativity of "|" -- and so unsuitability for many kinds of dict combination -- hasn't been emphasized before IIRC. Steve
Ka-Ping Yee writes:
On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <rosuav@gmail.com> wrote:
But adding dictionaries is fundamentally *useful*. It is expressive.
It is useful. It's just that + is the wrong name.
First, let me say that I prefer ?!'s position here, so my bias is made apparent. I'm also aware that I have biases so I'm sympathetic to those who take a different position. Rather than say it's "wrong", let me instead point out that I think it's pragmatically troublesome to use "+". I can think of at least four interpretations of "d1 + d2" 1. update 2. multiset (~= Collections.Counter addition) 3. addition of functions into the same vector space (actually, a semigroup will do ;-), and this is the implementation of Collections.Counter 4. "fiberwise" set addition (ie, of functions into relations) and I'm very jet-lagged so I may be missing some. There's also the fact that the operations denoted by "|" and "||" are often implemented as "short-circuiting", and therefore not commutative, while "+" usually is (and that's reinforced for mathematicians who are trained to think of "+" as the operator for Abelian groups, while "*" is a (possibly) non-commutative operator. I know commutativity of "+" has been mentioned before, but the non-commutativity of "|" -- and so unsuitability for many kinds of dict combination -- hasn't been emphasized before IIRC. Since "|" (especially "|=") *is* suitable for "update", I think we should reserve "+" for some future commutative extension. In the spirit of full disclosure: Of these, 2 is already implemented and widely used, so we don't need to use dict.__add__ for that. I've never seen 4 in the mathematical literature (union of relations is not the same thing). 3, however, is very common both for mappings with small domain and sparse representation of mappings with a default value (possibly computed then cached), and "|" is not suitable for expressing that sort of addition (I'm willing to say it's "wrong" :-). Steve
On Thu, Mar 7, 2019 at 9:12 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Ka-Ping Yee writes:
On Wed, Mar 6, 2019 at 4:01 PM Chris Angelico <rosuav@gmail.com> wrote:
But adding dictionaries is fundamentally *useful*. It is expressive.
It is useful. It's just that + is the wrong name.
First, let me say that I prefer ?!'s position here, so my bias is made apparent. I'm also aware that I have biases so I'm sympathetic to those who take a different position.
TBH, I am warming up to "|" as well.
Rather than say it's "wrong", let me instead point out that I think it's pragmatically troublesome to use "+". I can think of at least four interpretations of "d1 + d2"
1. update 2. multiset (~= Collections.Counter addition)
I guess this explains the behavior of removing results <= 0; it makes sense as multiset subtraction, since in a multiset a negative count makes little sense. (Though the name Counter certainly doesn't seem to imply multiset.)
3. addition of functions into the same vector space (actually, a semigroup will do ;-), and this is the implementation of Collections.Counter 4. "fiberwise" set addition (ie, of functions into relations)
and I'm very jet-lagged so I may be missing some.
There's also the fact that the operations denoted by "|" and "||" are often implemented as "short-circuiting", and therefore not commutative, while "+" usually is (and that's reinforced for mathematicians who are trained to think of "+" as the operator for Abelian groups, while "*" is a (possibly) non-commutative operator. I know commutativity of "+" has been mentioned before, but the non-commutativity of "|" -- and so unsuitability for many kinds of dict combination -- hasn't been emphasized before IIRC.
I've never heard of single "|" being short-circuiting. ("||" of course is infamous for being that in C and most languages derived from it.) And "+" is of course used for many non-commutative operations in Python (e.g. adding two lists/strings/tuples together). It is only *associative*, a weaker requirement that just says (A + B) + C == A + (B + C). (This is why we write A + B + C, since the grouping doesn't matter for the result.) Anyway, while we're discussing mathematical properties, and since SETL was briefly mentioned, I found an interesting thing in math. For sets, union and intersection are distributive over each other. I can't type the operators we learned in high school, so I'll use Python's set operations. We find that A | (B & C) == (A | B) & (A | C). We also find that A & (B | C) == (A & B) | (A & C). Note that this is *not* the case for + and * when used with (mathematical) numbers: * distributes over +: a * (b + c) == (a * b) + (a * c), but + does not distribute over *: a + (b * c) != (a + b) * (a + c). So in a sense, SETL (which uses + and * for union and intersection) got the operators wrong. Note that in Python, + and * for sequences are not distributive this way, since (A + B) * n is not the same as (A * n) + (B * n). OTOH A * (n + m) == A * n + A * m. (Assuming A and B are sequences of the same type, and n and m are positive integers.) If we were to use "|" and "&" for dict "union" and "intersection", the mutual distributive properties will hold.
Since "|" (especially "|=") *is* suitable for "update", I think we should reserve "+" for some future commutative extension.
One argument is that sets have an update() method aliased to "|=", so this makes it more reasonable to do the same for dicts, which also have a. update() method, with similar behavior (not surprising, since sets were modeled after dicts).
In the spirit of full disclosure: Of these, 2 is already implemented and widely used, so we don't need to use dict.__add__ for that. I've never seen 4 in the mathematical literature (union of relations is not the same thing). 3, however, is very common both for mappings with small domain and sparse representation of mappings with a default value (possibly computed then cached), and "|" is not suitable for expressing that sort of addition (I'm willing to say it's "wrong" :-).
-- --Guido van Rossum (python.org/~guido)
On 2019-03-08 16:55, Guido van Rossum wrote: [snip]
If we were to use "|" and "&" for dict "union" and "intersection", the mutual distributive properties will hold.
Since "|" (especially "|=") *is* suitable for "update", I think we should reserve "+" for some future commutative extension.
One argument is that sets have an update() method aliased to "|=", so this makes it more reasonable to do the same for dicts, which also have a. update() method, with similar behavior (not surprising, since sets were modeled after dicts).
[snip] One way to think of it is that a dict is like a set, except that each of its members has an additional associated value.
Guido van Rossum wrote:
I guess this explains the behavior of removing results <= 0; it makes sense as multiset subtraction, since in a multiset a negative count makes little sense. (Though the name Counter certainly doesn't seem to imply multiset.)
It doesn't even behave consistently as a multiset, since c[k] -= n is happy to let the value go negative.
For sets, union and intersection are distributive over each other.
Note that this is *not* the case for + and * when used with (mathematical) numbers... So in a sense, SETL (which uses + and * for union and intersection got the operators wrong.
But in another sense, it didn't. In Boolean algebra, "and" and "or" (which also distribute over each other) are often written using the same notations as multiplication and addition. There's no rule in mathematics saying that these notations must be distributive in one direction but not the other. -- Greg
On Fri, Mar 8, 2019 at 3:33 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
I guess this explains the behavior of removing results <= 0; it makes sense as multiset subtraction, since in a multiset a negative count makes little sense. (Though the name Counter certainly doesn't seem to imply multiset.)
It doesn't even behave consistently as a multiset, since c[k] -= n is happy to let the value go negative.
For sets, union and intersection are distributive over each other.
Note that this is *not* the case for + and * when used with (mathematical) numbers... So in a sense, SETL (which uses + and * for union and intersection got the operators wrong.
But in another sense, it didn't. In Boolean algebra, "and" and "or" (which also distribute over each other) are often written using the same notations as multiplication and addition. There's no rule in mathematics saying that these notations must be distributive in one direction but not the other.
I guess everybody's high school math(s) class was different. I don't ever recall seeing + and * for boolean OR/AND; we used ∧ and ∨. I learned | and & for set operations only after I learned programming; I think it was in PL/1. But of course it stuck because of C bitwise operators (which are also boolean OR/AND and set operations). This table suggests there's a lot of variety in how these operators are spelled: https://en.wikipedia.org/wiki/List_of_logic_symbols -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
I guess everybody's high school math(s) class was different. I don't ever recall seeing + and * for boolean OR/AND; we used ∧ and ∨.
Boolean algebra was only touched on briefly in my high school years. I can't remember exactly what notation was used, but it definitely wasn't ∧ and ∨ -- I didn't encounter those until much later. However, I've definitely seen texts on boolean alegbra in relation to logic circuits that write 'A and B' as 'AB', and 'A or B' as 'A + B'. (And also use an overbar for negation instead of the mathematical ¬). Maybe it depends on whether you're a mathematician or an engineer? The multiplication-addition notation seems a lot more readable when you have a complicated boolean expression, so I can imagine it being favoured by pragmatic engineering type people. -- Greg
On Thu, 7 Mar 2019 10:58:02 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
Chris, please learn to think twice before contributing what is essentially a trivialization of someone else's arguments. You're not doing anything useful here, and are just sounding like an asshole who wants to shut people up. Regards Antoine.
On Fri, Mar 15, 2019 at 12:34:45PM +0100, Antoine Pitrou wrote:
On Thu, 7 Mar 2019 10:58:02 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
Chris, please learn to think twice before contributing what is essentially a trivialization of someone else's arguments. You're not doing anything useful here, and are just sounding like an asshole who wants to shut people up.
I don't think you are being fair here, and I'd rather avoid getting into unhelpful arguments about tone and whether Chris is "trivializing" (a perjorative term) or "simplifying" (a more neutral term) Josh's position. But if you feel that Chris (and I) have missed parts of Josh's argument, then by all means point out what we missed. Josh, the same applies to you: I do want to give your objections a fair hearing in the updated PEP, so if you think I've missed something, please point it out. In context, I think Chris' response was valid: he was responding to a post by Josh whose entire argument was that using + for dict merging is an abuse of the + symbol because it isn't like numeric addition. If there is more to Josh's argument, can you point out to me what I have missed please? That's a genuine request, not a rhetorical question. Here's Josh's argument: https://mail.python.org/pipermail/python-ideas/2019-March/055733.html and for context, here is Chris' dismissal of Josh's argument: https://mail.python.org/pipermail/python-ideas/2019-March/055734.html and his explanation of why he is dismissing it. Chris is well within his right to dismiss an argument that doesn't impress him, which he did by summarizing it as "Stuff wouldn't be perfectly pure". (Pure in the sense of being like numeric addition.) I think that's pretty much an accurate summary: Josh apparently doesn't like using + for anything that isn't purely like + for real numbers. He calls using + for concatentation a "minor abuse" of the operator and argues that it would be bad for dict meging to use + because merging has different properties to numeric addition. (He has also criticised the use of + for concatenation in at least one other post.) He even gives qualified support for a dict merge operator: "there's nothing wrong with making dict merges easier" but just doesn't like the choice of + as the operator. He's entitled to his opinion, and Chris is entitled to dismiss it. (Aside: your email appears to have broken threading. I'm not sure why, your other emails seem to be threaded okay.) -- Steven
Another random thought about this: Mathematicians use addition as a metaphor for quite a range of different things, but they tend to only use the symbols ∪ and ∩ for actual sets, or things that are very set-like. So maybe that's an argument for using '+' rather than '|' for dict merging. -- Greg
On Sat, Mar 16, 2019 at 09:04:22PM +1300, Greg Ewing wrote:
Another random thought about this: Mathematicians use addition as a metaphor for quite a range of different things, but they tend to only use the symbols ∪ and ∩ for actual sets, or things that are very set-like. So maybe that's an argument for using '+' rather than '|' for dict merging.
If one views an ordered dict as an assoc list, '+' would mean prepending the new values to the existing ones. If one views an unordered dict as a set of ordered pairs, '|' would make sense. Stefan Krah
On Sat, 16 Mar 2019 03:44:02 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 15, 2019 at 12:34:45PM +0100, Antoine Pitrou wrote:
On Thu, 7 Mar 2019 10:58:02 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
Chris, please learn to think twice before contributing what is essentially a trivialization of someone else's arguments. You're not doing anything useful here, and are just sounding like an asshole who wants to shut people up.
I don't think you are being fair here, and I'd rather avoid getting into unhelpful arguments about tone and whether Chris is "trivializing" (a perjorative term) or "simplifying" (a more neutral term) Josh's position. But if you feel that Chris (and I) have missed parts of Josh's argument, then by all means point out what we missed.
When someone posts an elaborate argument (regardless of whether they are right or not) and someone else responds a one-liner that reduces it to "lots of words" and claims to rephrase it as a short caricatural statement, then it seems fair to me to characterize it as "trivializing". But if you feel that I missed a subtlety in Chris' position, and if you feel he was more respectful of the OP than I felt he was, then by all means point out what I missed. Regards Antoine.
On Fri, Mar 15, 2019 at 4:36 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Thu, 7 Mar 2019 10:58:02 +1100 Chris Angelico <rosuav@gmail.com> wrote:
Lots of words that basically say: Stuff wouldn't be perfectly pure.
Chris, please learn to think twice before contributing what is essentially a trivialization of someone else's arguments. You're not doing anything useful here, and are just sounding like an asshole who wants to shut people up.
Watch the tone please.
On Fri, Mar 15, 2019 at 10:34:45AM -0700, Brett Cannon wrote:
Watch the tone please.
Brett, you might have missed my comment about wanting to avoid unhelpful arguments about tone, but if you are going to complain about people's tone, the considerate thing to do is to say what it is that you're objecting to. Otherwise we're left guessing as to what it is and whether or not you are making an implied threat to apply the CoC. I responded to Antoine's post earlier, but thought that it was a respectful disagreement. Do you think that's not the case? -- Steven
On Fri, Mar 15, 2019 at 11:15 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 15, 2019 at 10:34:45AM -0700, Brett Cannon wrote:
Watch the tone please.
Brett, you might have missed my comment about wanting to avoid unhelpful arguments about tone, but if you are going to complain about people's tone, the considerate thing to do is to say what it is that you're objecting to.
The phrasing of "just sounding like an asshole who wants to shut people up" is unnecessary.
Otherwise we're left guessing as to what it is and whether or not you are making an implied threat to apply the CoC.
No implied "threat". If it was an official warning then I would have said so.
I responded to Antoine's post earlier, but thought that it was a respectful disagreement. Do you think that's not the case?
I think it skirts the edge of being disrespectful, hence the request to please be aware of how one comes across.
Le 05/03/2019 à 23:40, Greg Ewing a écrit :
Steven D'Aprano wrote:
The question is, is [recursive merge] behaviour useful enough and common enough to be built into dict itself? I think not. It seems like just one possible way of merging values out of many. I think it would be better to provide a merge function or method that lets you specify a function for merging values.
That's what this conversation led me to. I'm not against the addition for the most general usage (and current PEP's describes the behaviour I would expect before reading the doc), but for all other more specific usages, where we intend any special or not-so-common behaviour, I'd go with modifying Dict.update like this:
foo.update(bar, on_collision=updator) # Although I'm not a fan of the keyword I used This won’t be possible update() already takes keyword arguments:
foo = {} bar = {'a': 1} foo.update(bar, on_collision=lambda e: e) foo {'a': 1, 'on_collision': <function <lambda> at 0x10b8df598>} I don't see that as a problem at all. Having a function's signature containing a **kwargs doesn't disable to have explicit keyword arguments at the same time: `def foo(bar="baz", **kwargs):` is perfectly valid, as well as `def spam(ham: Dict, eggs="blah", **kwargs):`, so `update(other, on_collision=None, **added) is too, no? The major implication to such a modification of the Dict.update method, is that when you're using it with keyword arguments (by opposition to passing another dict/iterable as positional), you're making a small non-backward compatible change in
Le 06/03/2019 à 10:50, Rémi Lapeyre a écrit : that if in some code, someone was already using the keyword that would be chosing (here "on_collision"), their code would be broken by the new feature. I had never tried to pass a dict and kw arguments together, as it seemed to me that it wasn't supported (I would even have expected an exception to be raised), but it's probably my level of English that isn't high enough to get it right, or this part of the doc that doesn't describe well the full possible usage of the method (see here: https://docs.python.org/3/library/stdtypes.html#dict.update). Anyway, if the keyword is slected wisely, the collision case will almost never happen, and be quite easy to correct if it ever happened.
On Wed, Mar 6, 2019 at 11:18 PM Brice Parent <contact@brice.xyz> wrote:
The major implication to such a modification of the Dict.update method, is that when you're using it with keyword arguments (by opposition to passing another dict/iterable as positional), you're making a small non-backward compatible change in that if in some code, someone was already using the keyword that would be chosing (here "on_collision"), their code would be broken by the new feature. Anyway, if the keyword is slected wisely, the collision case will almost never happen, and be quite easy to correct if it ever happened.
You can make it unlikely, yes, but I'd dispute "easy to correct". Let's suppose that someone had indeed used the chosen keyword (and remember, the more descriptive the argument name, the more likely that it'll be useful elsewhere and therefore have a collision). How would they discover this? If they're really lucky, there MIGHT be an exception (if on_collision accepts only a handful of keywords, and the collision isn't one of them), but if your new feature is sufficiently flexible, that might not happen. There'll just be incorrect behaviour. As APIs go, using specific keyword args at the same time as **kw is a bit odd. Consider: button_options.update(button_info, on_click=frobnicate, style="KDE", on_collision="replace") It's definitely not obvious which of those will end up in the dictionary and which won't. Big -1 from me on that change. ChrisA
Le 06/03/2019 à 13:53, Chris Angelico a écrit :
The major implication to such a modification of the Dict.update method, is that when you're using it with keyword arguments (by opposition to passing another dict/iterable as positional), you're making a small non-backward compatible change in that if in some code, someone was already using the keyword that would be chosing (here "on_collision"), their code would be broken by the new feature. Anyway, if the keyword is slected wisely, the collision case will almost never happen, and be quite easy to correct if it ever happened. You can make it unlikely, yes, but I'd dispute "easy to correct". Let's suppose that someone had indeed used the chosen keyword (and remember, the more descriptive the argument name, the more likely that it'll be useful elsewhere and therefore have a collision). How would
On Wed, Mar 6, 2019 at 11:18 PM Brice Parent <contact@brice.xyz> wrote: they discover this? If they're really lucky, there MIGHT be an exception (if on_collision accepts only a handful of keywords, and the collision isn't one of them), but if your new feature is sufficiently flexible, that might not happen. There'll just be incorrect behaviour.
As APIs go, using specific keyword args at the same time as **kw is a bit odd. Consider:
button_options.update(button_info, on_click=frobnicate, style="KDE", on_collision="replace")
It's definitely not obvious which of those will end up in the dictionary and which won't. Big -1 from me on that change. That's indeed a good point. Even if the correction is quite easy to make in most cases. With keyword only changes:
button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace")) # or button_options.update(dict(on_collision="replace"), on_click=frobnicate, style="KDE") In the exact case you proposed, it could become a 2-liners: button_options.update(button_info) button_options.update(dict(on_click=frobnicate, style="KDE", on_collision="replace")) In my code, I would probably make it into 2 lines, to make clear that we have 2 levels of data merging, one that is general (the first), and one that is specific to this use-case (as it's hard written in the code), but not everyone doesn't care about the number of lines. But for the other part of your message, I 100% agree with you. The main problem with such a change is not (to me) that it can break some edge cases, but that it would potentially break them silently. And that, I agree, is worth a big -1 I guess.
Hi, Steven. I can help you with it. I added it as PEP 584. I had to add the PEP headers, but didn't do any other editing. I'm going to be out of town for the next 2 weeks, so I might be slow in responding. Eric On 3/1/2019 11:26 AM, Steven D'Aprano wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Thanks -- FYI I renamed the file to .rst (per convention for PEPs in ReST format) and folded long text lines. On Fri, Mar 1, 2019 at 8:53 AM Eric V. Smith <eric@trueblade.com> wrote:
Hi, Steven.
I can help you with it. I added it as PEP 584. I had to add the PEP headers, but didn't do any other editing.
I'm going to be out of town for the next 2 weeks, so I might be slow in responding.
Eric
On 3/1/2019 11:26 AM, Steven D'Aprano wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Looks like a good start. I think you should replace all of the lines: if isinstance(other, dict): with if isinstance(self, type(other)): Since if other is an instance of a dict subclass, he should be the one to process the addition. On the other hand, if self is an instance of the derived type, then we are free to do the combination. I think you should also change this wording: "the result type will be the type of the left operand" since the result type will be negotiated between the operands (even in your implemenation). __sub__ can be implemented more simply as a dict comprehension. Don't forget to return self in __isub__ and __iadd__ or they won't work. I think __isub__ would be simpler like this: def __isub__(self, it): if it is self: self.clear() else: for value in it: del self[value] return self I don't see why you would bother looking for keys (iter will do that anyway). On Friday, March 1, 2019 at 11:27:54 AM UTC-5, Steven D'Aprano wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven
I think "Current Alternatives" section must refer to long existing idiom, in addition to {**d1, **d2}: d3 = d1.copy() d3.update(d2) It is obvious nor easily discoverable, while it takes two lines. "There are no obvious way" and "there is at least one obvious way" is very different. On Sat, Mar 2, 2019 at 1:27 AM Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- INADA Naoki <songofacandy@gmail.com>
I have seen a ton of discussion about what dict addition should do, but have seen almost no mention of dict difference. This lack of discussion interest combined with me not recalling having needed the proposed subtraction semantics personally makes me wonder if we should hold off on locking in subtraction semantics just yet. Perhaps we could just scope the proposal to dictionary addition only for now? If I *were* to define dict difference, my intuition suggests supporting a second operand that is any iterable of keys and not just dicts. (Augmented dict subtraction is already proposed to accept such a broader second argument.) David Foster | Seattle, WA, USA On 3/1/19 8:26 AM, Steven D'Aprano wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I agree with David here. Subtraction wasn’t even part of the original discussion — it seems that it was only added as an afterthought because Guido felt they were natural to propose together and formed a nice symmetry. It’s odd that RHS values are not used at all, period. Further, there’s no precedent for bulk sequence/mapping removals like this... except for sets, for which it is certainly justified. I’ve had the opportunity to play around with my reference implementation over the last few days, and despite my initial doubts, I have *absolutely* fallen in love with dictionary addition — I even accidentally tried to += two dictionaries at work on Friday (a good, but frustrating, sign). For context, I was updating a module-level mapping with an imported one, a use case I hadn’t even previously considered. I have tried to fall in love with dict subtraction the same way, but every code sketch/test I come up with feels contrived and hack-y. I’m indifferent towards it, at best. TL;DR: I’ve lived with both for a week. Addition is now habit, subtraction is still weird.
Nice branch name! :)
I couldn’t help myself. Brandt
These semantics are intended to match those of update as closely as possible. For the dict built-in itself, calling keys is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, update prefers to use the keys method.
The above paragraph may be inaccurate. Although the dict docstring states that keys will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature?
print(dict.update.__doc__) D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
It's actually pretty interesting... and misleading/wrongish. It never says that keys is *called*... in reality, it just checks for the "keys" method before deciding whether to proceed with PyDict_Merge or PyDict_MergeFromSeq2. It should really read more like: D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present, has a .keys() method, and is a subclass of dict, then does: for k in E: D[k] = E[k] If E is present, has a .keys() method, and is not a subclass of dict, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] Should our __sub__ behavior be the same (i.e., iterate for dict subclasses and objects without "keys()", otherwise call "keys()" and iterate over that)? __iadd__ calls into this logic already. It seems to be the most "natural" solution here, if we desire behavior analogous to "update". Brandt On Fri, Mar 1, 2019 at 8:26 AM Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Should our __sub__ behavior be the same...
Sorry, our "__isub__" behavior. Long day... On Tue, Mar 5, 2019 at 2:47 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
These semantics are intended to match those of update as closely as
possible. For the dict built-in itself, calling keys is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, update prefers to use the keys method.
The above paragraph may be inaccurate. Although the dict docstring states that keys will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature?
print(dict.update.__doc__) D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
It's actually pretty interesting... and misleading/wrongish. It never says that keys is *called*... in reality, it just checks for the "keys" method before deciding whether to proceed with PyDict_Merge or PyDict _MergeFromSeq2. It should really read more like:
D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present, has a .keys() method, and is a subclass of dict, then does: for k in E: D[k] = E[k] If E is present, has a .keys() method, and is not a subclass of dict, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
Should our __sub__ behavior be the same (i.e., iterate for dict subclasses and objects without "keys()", otherwise call "keys()" and iterate over that)? __iadd__ calls into this logic already. It seems to be the most "natural" solution here, if we desire behavior analogous to "update".
Brandt
On Fri, Mar 1, 2019 at 8:26 AM Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Actually, this was made even more condition-y in 3.8. Now we check __iter__ too: D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present, has a .keys() method, is a subclass of dict, and hasn't overridden __iter__, then does: for k in E: D[k] = E[k] If E is present, has a .keys() method, and is not a subclass of dict or has overridden __iter__, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k] Bleh. On Tue, Mar 5, 2019 at 2:54 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
Should our __sub__ behavior be the same...
Sorry, our "__isub__" behavior. Long day...
On Tue, Mar 5, 2019 at 2:47 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
These semantics are intended to match those of update as closely as
possible. For the dict built-in itself, calling keys is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, update prefers to use the keys method.
The above paragraph may be inaccurate. Although the dict docstring states that keys will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature?
print(dict.update.__doc__) D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
It's actually pretty interesting... and misleading/wrongish. It never says that keys is *called*... in reality, it just checks for the "keys" method before deciding whether to proceed with PyDict_Merge or PyDict _MergeFromSeq2. It should really read more like:
D.update([E, ]**F) -> None. Update D from dict/iterable E and F. If E is present, has a .keys() method, and is a subclass of dict, then does: for k in E: D[k] = E[k] If E is present, has a .keys() method, and is not a subclass of dict, then does: for k in E.keys(): D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
Should our __sub__ behavior be the same (i.e., iterate for dict subclasses and objects without "keys()", otherwise call "keys()" and iterate over that)? __iadd__ calls into this logic already. It seems to be the most "natural" solution here, if we desire behavior analogous to "update".
Brandt
On Fri, Mar 1, 2019 at 8:26 AM Steven D'Aprano <steve@pearwood.info> wrote:
Attached is a draft PEP on adding + and - operators to dict for discussion.
This should probably go here:
https://github.com/python/peps
but due to technical difficulties at my end, I'm very limited in what I can do on Github (at least for now). If there's anyone who would like to co-author and/or help with the process, that will be appreciated.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Thanks to everyone who has contributed to the discussion, I have been reading all the comments even if I haven't responded. I'm currently working on an update to the PEP which will, I hope, improve some of the failings of the current draft. -- Steven
Would __iadd__ and __isub__ be added to collections.abc.MutableMapping? This would be consistent with other infix operations on mutable ABCs, but could potentially break backwards compatibility for anyone who has defined a MutableMapping subclass that implements __add__ but not __iadd__. On Sat, Mar 9, 2019 at 8:55 AM Steven D'Aprano <steve@pearwood.info> wrote:
Thanks to everyone who has contributed to the discussion, I have been reading all the comments even if I haven't responded.
I'm currently working on an update to the PEP which will, I hope, improve some of the failings of the current draft.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Mar 09, 2019 at 11:39:39AM -0800, Stephan Hoyer wrote:
Would __iadd__ and __isub__ be added to collections.abc.MutableMapping?
No, that will not be part of the PEP. The proposal is only to change dict itself. If people want to add this to MutableMapping, that could be considered seperately. -- Steven
Just in case I'm not the only one that had a hard time finding the latest version of this PEP, here it is in the PEPS Repo: https://github.com/python/peps/blob/master/pep-0584.rst -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
I'd like to make a plea to people: I get it, there is now significant opposition to using the + symbol for this proposed operator. At the time I wrote the first draft of the PEP, there was virtually no opposition to it, and the | operator had very little support. This has clearly changed. At this point I don't think it is productive to keep making subjective claims that + will be more confusing or surprising. You've made your point that you don't like it, and the next draft^1 of the PEP will make that clear. But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me! For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled. And to those who support this PEP, code examples where a dict merge operator will help are most welcome! ^1 Coming Real Soon Now™. -- Steven
On Fri, 22 Mar 2019 03:42:00 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
I'd rather see a method. Dict merging just doesn't occur often enough that an operator is desirable for it.
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
Yes, I still have no idea why this operator would supposedly be useful. How many dict merges do you write per month? Regards Antoine.
On 21/03/2019 17:06, Antoine Pitrou wrote:
On Fri, 22 Mar 2019 03:42:00 +1100 Steven D'Aprano <steve@pearwood.info> wrote:
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
I'd rather see a method. Dict merging just doesn't occur often enough that an operator is desirable for it.
Analogous to the relationship between list.sort() and sorted(), I can't help but think that a dict.merge() method would be a terrible idea. A merged() function is more defensible.
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
I don't use Python often enough to have much to offer, I'm afraid. The sort of occasion I would use dict merging is passing modified environments to subcommands. Something like: def process(): if time_to_do_thing1(): thing1(base_env + thing1_env_stuff + env_tweaks) if time_to_do_thing2(): thing2(base_env + thing2_env_stuff + env_tweaks) ...and so on. The current syntax for doing this is a tad verbose: def process(): if time_to_do_thing1(): env = base_env.copy() env.update(thing1_env_stuff) env.update(env_tweaks) thing1(env) del env if time_to_do_thing2(): env = base_env.copy() env.update(thing2_env_stuff) env.update(env_tweaks) thing2(env) del env -- Rhodri James *-* Kynesim Ltd
On 21/03/2019 17:59, Rhodri James wrote:
def process(): if time_to_do_thing1(): thing1(base_env + thing1_env_stuff + env_tweaks) if time_to_do_thing2(): thing2(base_env + thing2_env_stuff + env_tweaks)
...and so on. The current syntax for doing this is a tad verbose:
def process(): if time_to_do_thing1(): env = base_env.copy() env.update(thing1_env_stuff) env.update(env_tweaks) thing1(env) del env if time_to_do_thing2(): env = base_env.copy() env.update(thing2_env_stuff) env.update(env_tweaks) thing2(env) del env
Of course I forgot: def process(): if time_to_do_thing1(): thing1({**base_env, **thing1_env_stuff, **env_tweaks}) if time_to_do_thing2(): thing2({**base_env, **thing2_env_stuff, **env_tweaks}) ...which says something about how memorable that syntax is. -- Rhodri James *-* Kynesim Ltd
On Thu, 21 Mar 2019 17:59:41 +0000 Rhodri James <rhodri@kynesim.co.uk> wrote:
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
I don't use Python often enough to have much to offer, I'm afraid. The sort of occasion I would use dict merging is passing modified environments to subcommands. Something like:
def process(): if time_to_do_thing1(): thing1(base_env + thing1_env_stuff + env_tweaks) if time_to_do_thing2(): thing2(base_env + thing2_env_stuff + env_tweaks)
...and so on. The current syntax for doing this is a tad verbose:
def process(): if time_to_do_thing1(): env = base_env.copy() env.update(thing1_env_stuff) env.update(env_tweaks) thing1(env) del env if time_to_do_thing2(): env = base_env.copy() env.update(thing2_env_stuff) env.update(env_tweaks) thing2(env) del env
Ah, you convinced me there is a use case indeed (though `del env` isn't necessary above). I would still prefer something that's not an operator, but I agree there is potential to improve the current state of affairs. Note that, if you're able to live with a third-party dependency, the `toolz` package has what you need (and lots of other things too): https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge Regards Antoine.
Antoine Pitrou wrote:
Note that, if you're able to live with a third-party dependency, the `toolz` package has what you need (and lots of other things too): https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge
I suggest that the supporters of dict + dict make (and put up on PyPi) a pure-Python subclass of dict that has the desired properties. This would 1. Clarify and document the syntax and semantics. 2. Help with exploration and testing. 3. Provide a 'back-port' mechanism to current Python. 4. Give the proposal the benefit of practical experience. I find this last very important, when we can do it. And we can, in this case. Language changes are 'cast in stone' and hard to reverse. And afterwards, on this list, we're sometime told that we've 'missed the boat' for a particular change. Let's take the benefit of a reference pure Python implementation, when we can. Steven D'A. Please would you include or respond to this suggestion, in the next revision of the PEP. -- Jonathan
On Fri, Mar 22, 2019 at 8:44 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Antoine Pitrou wrote:
Note that, if you're able to live with a third-party dependency, the `toolz` package has what you need (and lots of other things too): https://toolz.readthedocs.io/en/latest/api.html#toolz.dicttoolz.merge
I suggest that the supporters of dict + dict make (and put up on PyPi) a pure-Python subclass of dict that has the desired properties. This would
1. Clarify and document the syntax and semantics. 2. Help with exploration and testing. 3. Provide a 'back-port' mechanism to current Python. 4. Give the proposal the benefit of practical experience.
The trouble with that is that you can't always use a dict subclass (or a non-subclass MutableMapping implementation, etc, etc, etc). There are MANY situations in which Python will give you an actual real dict, and it defeats the purpose if you then have to construct an AddableDict out of it just so you can add something to it. Not every proposed change makes sense on PyPI, and it definitely won't get a fair representation in "practical experience". If someone's proposing adding a new module to the standard library, then by all means, propose PyPI. But changes to core types can't be imported from other modules. Python is not Ruby. ChrisA
For anyone interested in "trying it out": if you're not against cloning and compiling CPython yourself, here is a PEP 584 C implementation I have PR'd against master right now. I'm keeping it in sync with the draft PEP as it changes, so subtraction performance is not overly optimized yet, but it will show you the *exact* behavior outlined in the PEP on the dict builtin and its subclasses. The relevant branch is called "addiction". You can clone it from: https://github.com/brandtbucher/cpython.git :)
On Thu, Mar 21, 2019 at 03:10:48PM -0700, Brandt Bucher wrote:
For anyone interested in "trying it out": if you're not against cloning and compiling CPython yourself, here is a PEP 584 C implementation I have PR'd against master right now. I'm keeping it in sync with the draft PEP as it changes, so subtraction performance is not overly optimized yet, but it will show you the *exact* behavior outlined in the PEP on the dict builtin and its subclasses. The relevant branch is called "addiction". You can clone it from:
That's great, thank you! For the sake of comparisons, could you support | as an alias? That will allow people to get a feel for whether a+b or a|b looks nicer. (For the record, the PEP isn't set in stone in regards to the choice of operator.
-- Steven
On Fri, Mar 22, 2019 at 07:59:03AM +0000, Jonathan Fine wrote:
Steven D'Aprano wrote:
(For the record, the PEP isn't set in stone in regards to the choice of operator.
Steven: Please say what parts of the PEP you consider to be set in stone. This will allow discussion to focus on essentials rather than details.
The PEP is primarily about making a merge operator, so that stays, regardless of whether it's spelled + | or something else. Otherwise there's no point to the PEP. If there is demand for a merged() method/function, that go into a competing PEP, but it won't be part of this PEP. If anyone wants to propose syntax for chained method calls (fluent programming) so we can write d.copy().update(), that won't be in this PEP either. Likewise for new syntax to turn method calls into operators. Feel free to propose a competing PEP (and I might even support yours, if it makes a good enough case). A grey area is the "last wins" merge behaviour matching update(). In theory, if somebody made an absolutely brilliant case for some other behaviour, I could change my mind, but it would have to be pretty amazing. In the absence of such, I'm going to use my perogative as PEP author to choose the behaviour I prefer to see, and leave alternatives to subclasses. -- Steven
On Mar 21, 2019, at 16:55, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Mar 21, 2019 at 03:10:48PM -0700, Brandt Bucher wrote: For anyone interested in "trying it out": if you're not against cloning and compiling CPython yourself, here is a PEP 584 C implementation I have PR'd against master right now. I'm keeping it in sync with the draft PEP as it changes, so subtraction performance is not overly optimized yet, but it will show you the *exact* behavior outlined in the PEP on the dict builtin and its subclasses. The relevant branch is called "addiction". You can clone it from:
That's great, thank you!
For the sake of comparisons, could you support | as an alias? That will allow people to get a feel for whether a+b or a|b looks nicer.
(For the record, the PEP isn't set in stone in regards to the choice of operator.
Great idea. I just added this, and all tests are passing. For reference, here’s the PR (it’s linked to the BPO, too): https://github.com/python/cpython/pull/12088
Chris Angelico wrote:
The trouble with that is that you can't always use a dict subclass (or a non-subclass MutableMapping implementation, etc, etc, etc). There are MANY situations in which Python will give you an actual real dict, and it defeats the purpose if you then have to construct an AddableDict out of it just so you can add something to it. Not every proposed change makes sense on PyPI, and it definitely won't get a fair representation in "practical experience".
Chris seems to accept that sometimes you can use a dict subclass, and that my proposal will give some representation of "practical experience". Even if not perfect, such benefits are I think worth having. And Chris gives no evidence (or examples) beyond his own assertions, that my proposal would not produce a fair representation of practical experience. Why don't we just try it and see. This would engage us with the users. And it would, as I suggested, clarify and document the syntax and semantics. And provide backporting to current versions of Python. By the way, in "Masterminds of Programming" [page 20], Guido gives four lines of defence against the unwise addition of a "favorite feature" to the language. They are [1] Explain to people that they can already do what they want. [2] Tell them to write their own module or class to encapsulate the feature. [3] Accept the feature, as pure Python, in the standard library. [4] Accept the feature as a C-Python extension standard. And [4] is, in Guido's words
the last line of defense before we have to admit [...] this is so useful [...] so we'll have to change the language
I think the pure Python implementation is important. If the supporters of this proposal are not willing to provide this, then I will (along with anyone else who volunteers). http://shop.oreilly.com/product/9780596515171.do # Masterminds of Programming -- Jonathan
On Fri, Mar 22, 2019 at 6:47 PM Jonathan Fine <jfine2358@gmail.com> wrote:
Chris Angelico wrote:
The trouble with that is that you can't always use a dict subclass (or a non-subclass MutableMapping implementation, etc, etc, etc). There are MANY situations in which Python will give you an actual real dict, and it defeats the purpose if you then have to construct an AddableDict out of it just so you can add something to it. Not every proposed change makes sense on PyPI, and it definitely won't get a fair representation in "practical experience".
Chris seems to accept that sometimes you can use a dict subclass, and that my proposal will give some representation of "practical experience".
I said "definitely won't", not "will give some". So, no. ChrisA
Le 21 mars 2019 à 17:43:31, Steven D'Aprano (steve@pearwood.info(mailto:steve@pearwood.info)) a écrit:
I'd like to make a plea to people:
I get it, there is now significant opposition to using the + symbol for this proposed operator. At the time I wrote the first draft of the PEP, there was virtually no opposition to it, and the | operator had very little support. This has clearly changed.
At this point I don't think it is productive to keep making subjective claims that + will be more confusing or surprising. You've made your point that you don't like it, and the next draft^1 of the PEP will make that clear.
But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me!
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
Thanks for the work you are doing on this PEP and for debunking my misconceptions regarding types, I’m currently learning a lot about them. I don’t know if it matters but I’m in favor of the method
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
Not matter the notation you end up choosing, I think this code: https://github.com/jpadilla/pyjwt/blob/master/jwt/utils.py#L71-L81 which is part of a widely used library to validate JWTs would greatly benefit from a new merge to merge dicts. (This package is 78 on https://hugovk.github.io/top-pypi-packages/) Rémi
^1 Coming Real Soon Now™.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, 21 Mar 2019 at 17:27, Rémi Lapeyre <remi.lapeyre@henki.fr> wrote:
Le 21 mars 2019 à 17:43:31, Steven D'Aprano (steve@pearwood.info(mailto:steve@pearwood.info)) a écrit:
I'd like to make a plea to people:
I get it, there is now significant opposition to using the + symbol for this proposed operator. At the time I wrote the first draft of the PEP, there was virtually no opposition to it, and the | operator had very little support. This has clearly changed.
At this point I don't think it is productive to keep making subjective claims that + will be more confusing or surprising. You've made your point that you don't like it, and the next draft^1 of the PEP will make that clear.
But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me!
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
Thanks for the work you are doing on this PEP and for debunking my misconceptions regarding types, I’m currently learning a lot about them.
I don’t know if it matters but I’m in favor of the method
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
Not matter the notation you end up choosing, I think this code: https://github.com/jpadilla/pyjwt/blob/master/jwt/utils.py#L71-L81
which is part of a widely used library to validate JWTs would greatly benefit from a new merge to merge dicts. (This package is 78 on https://hugovk.github.io/top-pypi-packages/)
It's already got a function that does the job. How much benefit is there *really* from being able to replace it with d1 + d2 once you drop support for Python < 3.8? But point taken that new code would have been able to avoid the function in the first place. ... or would it? def merge_dict(original, updates): if not updates: return original With the + operator. d1 + None will fail with an error. With your code, updates=None means "return the original unchanged". Does that matter with your current code? The point is that in many real world cases, you'd write a function *anyway*, to handle corner cases, and a new operator doesn't make much difference at that point. Having said all of that, I'm mostly indifferent to the idea of having a built in "dictionary merge" capability - I doubt I'd use it *much*, but if it were there I'm sure I'd find useful it on the odd occasion. I'm somewhat against an operator, I really don't see why this couldn't be a method (although the asymmetry in d1.merge(d2) makes me have a mild preference for a class method or standalone function). I can't form an opinion between + and |, I find | significantly uglier (I tend to avoid using it for sets, in favour of the union method) but I am mildly uncomfortable with more overloading of +. Serious suggestion - why not follow the lead of sets, and have *both* an operator and a method? And if you think that's a bad idea, it would be worth considering *why* it's a bad idea for dictionaries, when it's OK for sets (and "well, I didn't like it when sets did it" isn't sufficient ;-)) And having said that, I'll go back to lurking and not really caring one way or the other. Paul
Rémi Lapeyre wrote:
Not matter the notation you end up choosing, I think this code: https://github.com/jpadilla/pyjwt/blob/master/jwt/utils.py#L71-L81 [...] would greatly benefit from a new merge to merge dicts.
I've looked at the merge_dict defined in this code. It's similar to def gapfill(self, other): # See also: https://cobrapy.readthedocs.io/en/latest/gapfilling.html # Cobra's gapfill adds items to a model, to meet a requirement. for key in other.keys(): if key not in self: self[key] = other[key] (This is code I've written, that's not yet on PyPi.) The usage is different. Instead of writing one of aaa = merge_dict(aaa, bbb) ccc = merge_dict(aaa, bbb) you write one of gapfill(aaa, bbb) aaa.gapfill(bbb) # If gapfill added to dict methods. With merge_dict, you never really know if ccc is the same object as aaa, or a different one. Sometimes this is important. With gapfill, you get the same behaviour as the already familiar and loved dict.update. But of course with a different merge rule. -- Jonathan
On 3/21/19 1:01 PM, Jonathan Fine wrote:
Rémi Lapeyre wrote:
Not matter the notation you end up choosing, I think this code: https://github.com/jpadilla/pyjwt/blob/master/jwt/utils.py#L71-L81 [...] would greatly benefit from a new merge to merge dicts.
I've looked at the merge_dict defined in this code. It's similar to
def gapfill(self, other):
# See also: https://cobrapy.readthedocs.io/en/latest/gapfilling.html # Cobra's gapfill adds items to a model, to meet a requirement.
for key in other.keys(): if key not in self: self[key] = other[key]
(This is code I've written, that's not yet on PyPi.) The usage is different. Instead of writing one of aaa = merge_dict(aaa, bbb) ccc = merge_dict(aaa, bbb) you write one of gapfill(aaa, bbb) aaa.gapfill(bbb) # If gapfill added to dict methods.
With merge_dict, you never really know if ccc is the same object as aaa, or a different one. Sometimes this is important.
With gapfill, you get the same behaviour as the already familiar and loved dict.update. But of course with a different merge rule.
With gapfill, I can never remeber whether it's gapfill(aaa, bbb) or gapfill(bbb, aaa). This is always important. :-) At least with aaa.gapfill(bbb), I have some sense of the "direction" of the asymmetry, or I would if I had some frame of reference into which to put the "gapfill" operation. (With the proposed + or | operator syntax, that gets lost.)
Steven D'Aprano wrote:
But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me!
# What does this do? >>> items. update(points) # And what does this do? >>> items += points What did you get? Here's one possible context. >>> Point = namedtuple('Point', ['x', 'y']) >>> p, q, r = Point(1,2), Point(3, 4), Point(5, 6) >>> points = set([p, q, r]) >>> points {Point(x=1, y=2), Point(x=5, y=6), Point(x=3, y=4)} >>> items = dict(a=4, b=8) -- Jonathan
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
I would definitely include the example you alluded to in the operators thread: Before: tmp = keep.copy() tmp.update(separate) result = function(param=tmp) del tmp After: result = f(param=keep+separate) Thanks for drafting the PEP for this. There seems to be a bit of an echo in these 5+ threads, and your commentary has definitely been more constructive/original than most. Looking forward to the next revision! Brandt On Thu, Mar 21, 2019 at 9:42 AM Steven D'Aprano <steve@pearwood.info> wrote:
I'd like to make a plea to people:
I get it, there is now significant opposition to using the + symbol for this proposed operator. At the time I wrote the first draft of the PEP, there was virtually no opposition to it, and the | operator had very little support. This has clearly changed.
At this point I don't think it is productive to keep making subjective claims that + will be more confusing or surprising. You've made your point that you don't like it, and the next draft^1 of the PEP will make that clear.
But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me!
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
^1 Coming Real Soon Now™.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I'd also like to add what I consider to be another point in favor of an operator: Throughout all of these related threads, I have seen many typos and misspellings of current dict merging idioms, from messing up the number of asterisks in "{**a, **b}", to even Guido(!) accidentally writing the common copy/update idiom as d = d1.copy() d = d1.update(d2) in a thoughtful email... and it was then copied-and-pasted (unquoted and verbatim) by others! I still have yet to see somebody (even those who claim to be confused by it) mess up the PEP's current definition of "+" or "+=" in this context. Brandt
I dislike the symbol '+' to mean "dictionary merging with value updates." I have no objection to, and mildly support, adding '|' with this meaning. It's not really possible to give "that one example" where + for meeting makes code less clear... In my eyes it would be EVERY such use. Every example presented in this thread or in the PEP feels wrong to me. I know about operator overloading and dunder methods and custom classes. My intuition about '+' from math, other programming languages, and Python, simply does not lead me to expect the proposed meaning. On Thu, Mar 21, 2019, 12:43 PM Steven D'Aprano <steve@pearwood.info> wrote:
I'd like to make a plea to people:
I get it, there is now significant opposition to using the + symbol for this proposed operator. At the time I wrote the first draft of the PEP, there was virtually no opposition to it, and the | operator had very little support. This has clearly changed.
At this point I don't think it is productive to keep making subjective claims that + will be more confusing or surprising. You've made your point that you don't like it, and the next draft^1 of the PEP will make that clear.
But if you have *concrete examples* of code that currently is easy to understand, but will be harder to understand if we add dict.__add__, then please do show me!
For those who oppose the + operator, it will help me if you made it clear whether it is *just* the + symbol you dislike, and would accept the | operator instead, or whether you hate the whole operator concept regardless of how it is spelled.
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
^1 Coming Real Soon Now™.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Mar 21, 2019 at 06:02:05PM -0400, David Mertz wrote:
I dislike the symbol '+' to mean "dictionary merging with value updates." I have no objection to, and mildly support, adding '|' with this meaning.
It's not really possible to give "that one example" where + for meeting makes code less clear... In my eyes it would be EVERY such use.
I suspect that I may not have explained myself properly. Sorry. Let me try to explain again. A number of people including Antoine and Serhiy seem to have taken the position that merely adding dict.__add__ will make existing code using + harder to understand, as you will need to consider not just numeric addition and concatenation, but also merging, when reading code. *If this were true* it would be an excellent argument against using + for dict merges. But is it true? Would you agree that this example of + is perfectly clear today? for digit in digits: num = num*10 + digit By both the naming (digit, num) and presence of multiplication by the literal 10, it should be pretty obvious that this is probably doing integer addition. (I suppose it is conceivable that this is doing sequence repetition and concatenation, but given the names that interpretation would be rather unexpected.) We shouldn't find it hard to understand that code, using nothing more than *local* context. There's no need to search the global context to find out what num and digits are. (Although in the specific example I copied that snippet from, that information is only two or three lines away. But in principle, we might have needed to search an arbitrarily large code base to determine what they were.) Adding dict.__add__ isn't going to make that example harder to understand. If it did, that would be a big blow to the + proposal. Antoine and Serhiy seem to worry that there are existing uses of + which are currently easy to understand but will become less so if dict.__add__ is added. I respect that worry, even if I doubt that they are correct. If someone can demonstrate that their fear is well-founded, that would be an excellent counter-argument to the PEP's proposal to use +. What *doesn't* count as a demonstration: 1. Toy examples using generic names don't count. With generic, meaningless names, they're not meaningful now and so adding dict.__add__ won't make them *less* meaningful: # is this concatenation or numeric addition? who can tell? for spam in spammy_macspamface: eggs += spam Regardless of whether dicts support + or not, we would still have to search the global context to work out what eggs and spam are. Adding dict.__add__ doesn't make this harder. 2. Purely opinion-based subjective statements, since they basically boil down to "I don't like the use of + for dict merging." That point has been made, no need to keep beating that drum. 3. Arguments based on unfamiliarity to the new operator: preferences += {'EDITOR': 'ed', 'PAGESIZE': 'A4'} might give you a bit of a double-take the first time you see it, but it surely won't still be surprising you in five years time. I realise that this is a high bar to reach, but if somebody does reach it, and demonstrates that Antoine and Serhiy's fears are well-founded, that would be a very effective and convincing argument.
Every example presented in this thread or in the PEP feels wrong to me. I know about operator overloading and dunder methods and custom classes. My intuition about '+' from math, other programming languages, and Python, simply does not lead me to expect the proposed meaning.
And your subjective feeling is well-noted :-) -- Steven
On Thu, Mar 21, 2019, 7:48 PM Steven D'Aprano <steve@pearwood.info> wrote:
A number of people including Antoine and Serhiy seem to have taken the position that merely adding dict.__add__ will make existing code using + harder to understand, as you will need to consider not just numeric addition and concatenation, but also merging, when reading code.
Would you agree that this example of + is perfectly clear today?
for digit in digits: num = num*10 + digit
By both the naming (digit, num) and presence of multiplication by the literal 10, it should be pretty obvious that this is probably doing integer addition.
Yep. This is clear and will not become less clear if some more objects grow an .__add__() methods. Already, it is POSSIBLE that `num` and `digit` mean something other than numbers. Bad naming of variables if so, but not prohibited. For example, NumPy uses '+' and '*' for elementwise operations, often with broadcasting to different areas shapes. Maybe that's code dealing with vectorised arrays... But probably not. Holoviews users '+' and '*' to combine elements of graphs. E.g labelled = low_freq * high_freq * linpoints overlay + labelled + labelled.Sinusoid.Low_Frequency ggplot in R has similar behavior. Maybe your loop is composing a complex graph... But probably not. Nonetheless, if I see `dict1 + dict2` the meaning you intend in the PEP does not jump out as the obvious behavior. Nor even as the most useful behavior. Of course I could learn it and teach it, but it will always feel like a wart in the language. In contrast, once you tell me about the special object "vectorised arrays", `arr1 + arr2` does exactly what is expect in NumPy. And your subjective feeling is well-noted :-)
This is more than "merely subjective." I teach Python. I write books about Python. I've had tens of millions of readers of articles I've written about Python. I'm not the only person in this discussion with knowledge of learners and programmers and scientists... But the opinions I'm expressing ARE on their behalf too (as I perceive likely surprise and likely bugs). I like most of the design of Python. Almost all, even. But there are a few warts in it. This would be a wart.
On Thu, Mar 21, 2019 at 08:13:01PM -0400, David Mertz wrote:
On Thu, Mar 21, 2019, 7:48 PM Steven D'Aprano <steve@pearwood.info> wrote: [...] Nonetheless, if I see `dict1 + dict2` the meaning you intend in the PEP does not jump out as the obvious behavior. Nor even as the most useful behavior.
What would be the most useful behaviour for dict "addition" in your opinion?
Of course I could learn it and teach it, but it will always feel like a wart in the language.
Would that wartness be lessoned if it were spelled | or << instead?
In contrast, once you tell me about the special object "vectorised arrays", `arr1 + arr2` does exactly what is expect in NumPy.
I don't know Numpy well enough to know whether that is elementwise addition or concatenation or something else, so that example doesn't resonate with me. I can't guess what you expect, and I have no confidence that my guess (matrix addition of equal-sized arrays, an exception if unequal) will be what Numpy does.
And your subjective feeling is well-noted :-)
This is more than "merely subjective."
If it is more than subjective, then there must be an objective test that anyone, or a computer program, could do to tell whether or not the + operator on dicts will be ... um, what? A wart? Ugly? Both of those are subjective value judgements, so I'm not sure what objective claim you believe you are making which is "more than" subjective. The point is, I'm not *discounting* the subjective claims that + on dicts is ugly. I've acknowledged them, and the next draft of the PEP will do so too. But repetition doesn't make a subjective value judgement objective. It might boil down to a subjective preference for + over | or visa versa, or another operator, or no operator at all. That's fine: language design is partly subjective. But I'd like to see more comments based on objective reasons we can agree on, and fewer arguments that boil down to "I just don't like it". -- Steven
On Thu, Mar 21, 2019, 10:15 PM Steven D'Aprano <steve@pearwood.info> wrote:
What would be the most useful behaviour for dict "addition" in your opinion?
Probably what I would use most often was a "lossless" merging in which duplicate keys resulted in the corresponding value becoming a set containing all the merged values. E.g.
d1 = {1: 55, 2: 77, 3: 88} d2 = {3: 99, 4: 22} add(d1, d2) {1: 55, 2: 77, 3: {88, 99}, 4:22}
I'm sure most users would hate this too. It changes the type of values between a thing and a set of things, and that has to be sorted out downstream. But it is lossless in a similar way to Counter or sequence addition. I can write what I want perfectly well. Perhaps useing defaultdict as a shortcut to get there. And I know there are some behaviors I have not specified here, but my function can do whatever I want in the edge cases. If we're to see 'd1 + d2' for the first time without having followed this discussion, my guess would be behavior similar to what I show.
Of course I could learn it and teach it, but it will always feel
like a wart in the language.
Would that wartness be lessoned if it were spelled | or << instead?
Yes, definitely. Both those spellings feel pretty natural to me. They don't have the misleading associations '+' carries. I'm kinda fond of '<<' because it visitation resembles an arrow that I can think of as "put the stuff here into there".
In contrast, once you tell me about the special object "vectorised arrays",
`arr1 + arr2` does exactly what is expect in NumPy.
I don't know Numpy well enough to know whether that is elementwise addition or concatenation or something else, so that example doesn't resonate with me. I can't guess what you expect, and I have no confidence that my guess (matrix addition of equal-sized arrays, an exception if unequal) will be what Numpy does
Fair enough. I've worked with NumPy long enough that perhaps I forget what my first intuition was. I accept that it's non-obvious to many users. FWIW, I really love NumPy behavior, but it's a shift in thinking vs lists. E.g.
a = array([1, 2, 3]) b = array([[10, 11, 12], [100, 200, 300]]) a + b [[ 11 13 15 ] [ 101 202 303]]
This is "broadcasting" of compatible shapes.
On Fri, Mar 22, 2019 at 1:58 PM David Mertz <mertz@gnosis.cx> wrote:
On Thu, Mar 21, 2019, 10:15 PM Steven D'Aprano <steve@pearwood.info> wrote:
Of course I could learn it and teach it, but it will always feel like a wart in the language.
Would that wartness be lessoned if it were spelled | or << instead?
Yes, definitely. Both those spellings feel pretty natural to me. They don't have the misleading associations '+' carries. I'm kinda fond of '<<' because it visitation resembles an arrow that I can think of as "put the stuff here into there".
Please no. The "cuteness" value of abusing the operator to indicate information flow got old shortly after C++ did it, and it doesn't help. With normal operator overloading, you can say "the + operator means addition", and then define "addition" for different types. Perhaps that ship has sailed, since we already have division between path objects, but at least in that example it is VERY closely related. There's no use of "<<" inside string literals with dictionaries the way there's "/foo/bar/spam" in paths. Dictionary merging is a form of addition. It's also related to set union, which is well known as part of the pipe operator. Either of those is far better than abusing left shift. ChrisA
On 3/21/19 6:46 PM, Steven D'Aprano wrote:
Antoine and Serhiy seem to worry that there are existing uses of + which are currently easy to understand but will become less so if dict.__add__ is added. I respect that worry, even if I doubt that they are correct.
If someone can demonstrate that their fear is well-founded, that would be an excellent counter-argument to the PEP's proposal to use +.
https://docs.python.org/3.8/library/collections.html has some examples using collections.Counter, which is clearly described as being a subclass of dict. Amongst the examples: c + d # add two counters together: c[x] + d[x] That's the + operator operating on two dicts (don't make me quote the Liskov Substitution Principle), but doing something really different than the base operator. So if I know that c and d (or worse, that one of them) is a dict, then interpreting c + d becomes much more interesting, but arguably no worse than c.update(d). Yes, it's "just" polymorphism, but IMO it violates the Principle of Least Surprise. My apologies if this is covered elsewhere in this thread, or it doesn't meet the bar Steven set.
https://docs.python.org/3.8/library/collections.html has some examples using collections.Counter, which is clearly described as being a subclass of dict. Amongst the examples:
c + d # add two counters together: c[x] + d[x]
That's the + operator operating on two dicts (don't make me quote the Liskov Substitution Principle), but doing something really different than the base operator.
So if I know that c and d (or worse, that one of them) is a dict, then interpreting c + d becomes much more interesting,
Killing a use of a common operator with a very common built in data type because the operator is used in a different way by a specialized object in the stdlib seems a bit backwards to me. Frankly, I think considering Counter as a dict subclass is the mistake here, even if it is true. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 3/21/19 9:19 PM, Christopher Barker wrote:
https://docs.python.org/3.8/library/collections.html has some examples using collections.Counter, which is clearly described as being a subclass of dict. Amongst the examples:
c + d # add two counters together: c[x] + d[x]
That's the + operator operating on two dicts (don't make me quote the Liskov Substitution Principle), but doing something really different than the base operator.
So if I know that c and d (or worse, that one of them) is a dict, then interpreting c + d becomes much more interesting,
Killing a use of a common operator with a very common built in data type because the operator is used in a different way by a specialized object in the stdlib seems a bit backwards to me.
Perhaps. Note that Counter also uses | and & for other operations that probably wouldn't make much sense on base dicts.
Frankly, I think considering Counter as a dict subclass is the mistake here, even if it is true.
I had the same thought that Counter is misdesigned in one way or another, but (a) that ship has long sailed, and (b) I didn't want to run off on that tangent. My point remains: because Counter is a subclass of dict, and Counter uses the + operator for something that doesn't apply to base dicts, adding + to dicts *may* cause confusion that wasn't there before. Presently, +, -, |, and & all raise an exception when given a Counter and a dict. This (raising an exception) is probably still the Right Thing to do in that case, even with a + operator on dicts, but that violates the LSP and IMO the PLS.
On 2019-03-22 02:40, Dan Sommers wrote:
On 3/21/19 9:19 PM, Christopher Barker wrote:
https://docs.python.org/3.8/library/collections.html has some examples using collections.Counter, which is clearly described as being a subclass of dict. Amongst the examples:
c + d # add two counters together: c[x] + d[x]
That's the + operator operating on two dicts (don't make me quote the Liskov Substitution Principle), but doing something really different than the base operator.
So if I know that c and d (or worse, that one of them) is a dict, then interpreting c + d becomes much more interesting,
Killing a use of a common operator with a very common built in data type because the operator is used in a different way by a specialized object in the stdlib seems a bit backwards to me.
Perhaps. Note that Counter also uses | and & for other operations that probably wouldn't make much sense on base dicts.
Frankly, I think considering Counter as a dict subclass is the mistake here, even if it is true.
I had the same thought that Counter is misdesigned in one way or another, but (a) that ship has long sailed, and (b) I didn't want to run off on that tangent.
[snip] Counter is trying to provide the functionality of 2 kinds of container: 1. A counting container. 2. A multi-set. + makes sense for counting (sum); | makes sense for multi-sets (union).
Python fits well in the mind, because (1) by design it reduces cognitive load, and (2) it encourages its users to reduce cognitive load, and (3) we have a culture of reading code, taking pride in our code. Readability counts. https://en.wikipedia.org/wiki/Cognitive_load Steven D'Aprano says that examples such as below don't help us discuss the cognitive load associated with dict + dict.
1. Toy examples using generic names don't count. eggs += spam
I assume he's referring to my example >>> items.update(points) >>> items += points In this example, items.update gives useful additional information. We expect, from duck typing and sensible naming, that points can be iterated to give key value pairs. In Python, when >>> a + b gives one of TypeError: unsupported operand type(s) for +: 'int' and 'str' TypeError: Can't convert 'int' object to str implicitly we get a very strong hint to write instead something like a + int(b) str(a) + b so that the nature of the addition is made clear to the next person who reads the code (who might be ourselve, in a crisis, in ten years time.) (JavaScript does implicit conversion. This makes the code easier to write, harder to read, and harder to maintain.) For certain values of dct and lst we get >>> lst += dct >>> lst [('a', 1), ('b', 2), 'c', 'd'] For the same values of dct and lst (if proposal allowed) >>> dct += lst >>> dct {'a': 1, 'b': 2, 'c': 3, 'd': 4} In these examples, dct is a dict, and lst is a list. This behaviour is something Python users will have to learn, and have in their mind, whenever they see '+=' in unfamiliar code. I find this as much an unwelcome cognitive load as that produced by Javascript's > 2 * "8" 16 > 2 + "8" "28" To be fair, this may in part be a problem with our expectations about +=. -- Jonathan
On Fri, Mar 22, 2019 at 3:42 AM Steven D'Aprano <steve@pearwood.info> wrote:
And to those who support this PEP, code examples where a dict merge operator will help are most welcome!
Since Python examples don't really exist yet, I'm reaching for another language that DOES have this feature. Pike's mappings (broadly equivalent to Python's dicts) can be added (actually, both + and | are supported), with semantics equivalent to PEP 584's. Translated into Python syntax, here's a section from the implementation of Process.run(): def run(cmd, modifiers={}): ... ... p = Process(cmd, modifiers + { "stdout": mystdout->pipe(), "stderr": mystderr->pipe(), "stdin": mystdin->pipe(), }) In Val.TimeTZ, a subclass that adds a timezone attribute overrides a mapping-returning method to incorporate the timezone in the result mapping. Again, translated into Python syntax: def tm(self): return super().tm() + {"timezone": self.timezone} To spawn a subprocess with a changed environment variable: //from the Process.create_process example Process.create_process(({ "/usr/bin/env" }), (["env" : getenv() + (["TERM":"vt100"]) ])); # equivalent Python code subprocess.Popen("/usr/bin/env", env=os.environ + {"TERM": "vt100"}) All of these examples could be done with the double-star syntax, as they all use simple literals. But addition looks a lot cleaner IMO, and even more so if you're combining multiple variables rather than using literals. ChrisA
participants (40)
-
Anders Hovmöller
-
Andre Roberge
-
Antoine Pitrou
-
Brandt Bucher
-
Brett Cannon
-
Brice Parent
-
Chris Angelico
-
Chris Barker
-
Christopher Barker
-
Dan Sommers
-
David Foster
-
David Mertz
-
David Shawley
-
Del Gan
-
Eric V. Smith
-
Greg Ewing
-
Guido van Rossum
-
Inada Naoki
-
INADA Naoki
-
James Edwards
-
James Lu
-
Jimmy Girardet
-
Jonathan Fine
-
Josh Rosenberg
-
João Matos
-
Ka-Ping Yee
-
Michael Lee
-
MRAB
-
Nathan Schneider
-
Neil Girdhar
-
Paul Moore
-
Rhodri James
-
Rémi Lapeyre
-
Serhiy Storchaka
-
Stefan Behnel
-
Stefan Krah
-
Stephan Hoyer
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy