PEP 584: Add + and += operators to the built-in dict class.
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition): https://www.python.org/dev/peps/pep-0584/ The accompanying reference implementation is on GitHub: https://github.com/brandtbucher/cpython/tree/addiction This new draft incorporates much of the feedback that we received during the first round of debate here on python-ideas. Most notably, the difference operators (-/-=) have been dropped from the proposal, and the implementations have been updated to use "new = self.copy(); new.update(other)" semantics, rather than "new = type(self)(); new.update(self); new.update(other)" as proposed before. It also includes more background information and summaries of major objections (with rebuttals). Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review. Thanks! Brandt
Brandt Bucher wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition): https://www.python.org/dev/peps/pep-0584/ The accompanying reference implementation is on GitHub: https://github.com/brandtbucher/cpython/tree/addiction
Do you know if there is an existing proposal for subtraction of iterables from lists and tuples? That would be extremely handy to me sometimes? In [1]: [1,2,3] - [2] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-220e30c8610a> in <module> ----> 1 [1,2,3] - [2] TypeError: unsupported operand type(s) for -: 'list' and 'list'
On Oct 17, 2019, at 00:22, Steve Jorgensen <stevej@stevej.name> wrote:
Do you know if there is an existing proposal for subtraction of iterables from lists and tuples?
Any existing proposal would be listed in PEP 0 if it’s a PEP, on bugs.python.org if it doesn’t need a PEP, or in a current thread in this list’s archive if it hasn’t gotten that far. They’re all pretty easy to look up. If you’re looking for historical proposals that died out without every making it to PEP stage (which you should do if you’re consisting proposing it), you pretty much have to search the list archives.
Andrew Barnert wrote:
On Oct 17, 2019, at 00:22, Steve Jorgensen stevej@stevej.name wrote:
Do you know if there is an existing proposal for subtraction of iterables from lists and tuples? Any existing proposal would be listed in PEP 0 if it’s a PEP, on bugs.python.org if it doesn’t need a PEP, or in a current thread in this list’s archive if it hasn’t gotten that far. They’re all pretty easy to look up. If you’re looking for historical proposals that died out without every making it to PEP stage (which you should do if you’re consisting proposing it), you pretty much have to search the list archives.
I wasn't sure this would be an easy thing to find with search terms so thought I'd ask in the context of a similar proposal — in case someone remembers something. I'll do a search now though. :)
On Oct 16, 2019, at 22:35, Brandt Bucher <brandtbucher@gmail.com> wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition):
Minor nit: the formatting is broken in some of the examples of candidates. The first one I noticed is the last Sphinx one. Less trivial, but maybe dismissible: it says one of the authors is going to write another PEP later to add the full complement of set operators to dict. I don’t think anyone wants both + and | meaning the same thing, or wants set operators named +/&/^/- instead of |/&/^/-. But this PEP argues compellingly for + over | if we’re only adding merge. So wouldn’t you want to finish both proposals and present them at the same time, with each one explicitly invalidating the other, rather than presenting this one first with the side effect of making the second one less compelling and/or more complicated if this one gets accepted? Finally: if Mapping and MutableMapping don’t get the operators (if you’re following sequences, MutableSequence.__iadd__ exists but Sequence.__add__ doesn’t, I think?), will they be added manually to UserDict and ChainMap? (And are there any other types that need that question?)
Andrew Barnert wrote:
Minor nit: the formatting is broken in some of the examples of candidates. The first one I noticed is the last Sphinx one.
Thanks for spotting those! I'll fix them.
Less trivial, but maybe dismissible: it says one of the authors is going to write another PEP later to add the full complement of set operators to dict. I don’t think anyone wants both + and | meaning the same thing, or wants set operators named +/&/^/- instead of |/&/^/-. But this PEP argues compellingly for + over | if we’re only adding merge. So wouldn’t you want to finish both proposals and present them at the same time, with each one explicitly invalidating the other, rather than presenting this one first with the side effect of making the second one less compelling and/or more complicated if this one gets accepted?
I'm the "interested author". I agree with you that this section (especially the parenthetical) could be worded better. It was included in the interest of summarizing discussions that had taken place, and at the time pipe-versus-plus was still being hotly debated. Had pipe won out (or if we change the PEP, not very likely at this point) I would have probably moved forward with the others. I have no interest in a competing PEP, or one that follows up with the other operators if +/+= is accepted.
Finally: if Mapping and MutableMapping don’t get the operators (if you’re following sequences, MutableSequence.__iadd__ exists but Sequence.__add__ doesn’t, I think?), will they be added manually to UserDict and ChainMap? (And are there any other types that need that question?)
Open question. I think that's up to the authors of those individual non-subclasses, if so. `UserDict` is a bit different, as it has tests mandating that it have all of the same attributes as dicts. So, our C implementation gives `UserDict` the `__add__` trio, in the interest of having a passing test suite (clearly this is what the author intended, though).
On Thu, Oct 17, 2019 at 12:41:29AM -0700, Andrew Barnert via Python-ideas wrote:
Less trivial, but maybe dismissible: it says one of the authors is going to write another PEP later to add the full complement of set operators to dict.
It says that one of the authors is *interested* in writing another PEP. https://www.python.org/dev/peps/pep-0584/#id28 That's not a promise to do so. And of course, if anyone else wishes to write that PEP first, please go ahead.
I don’t think anyone wants both + and | meaning the same thing, or wants set operators named +/&/^/- instead of |/&/^/-.
I think that is a fair point. The PEP points out that one advantage of the pipe operator is that it is forward-compatible with extensions to the API to support the full set of set operators. Some may consider that a *disadvantage* of the pipe operator.
But this PEP argues compellingly for + over | if we’re only adding merge.
Thank you. But as one of the authors, I'm not so certain that the argument for + over | is "compelling". Despite the PEP title, this is not *only* about the plus operator. I have tried my best to set out the best possible argument for the plus operator, but there are excellent arguments in favour of the pipe operator or a merged method as well, and I refuse to state my preference (or even whether or not I have a preference). The proposal has two conceptual parts: - the functionality - the spelling and of the two, I think the functionality is more important. When the discussion first started, the early consensus seemed to be in favour of plus; as the proposal continued, that consensus shifted, with more people stating that they wanted the functionality but hated the spelling. It may be that we cannot gain consensus over the correct spelling, and it will come down to a ruling from the Steering Council. There are at least four alternatives that the Steering Council can end up taking: - approval for the plus operator - approval of one of the alternatives - rejection of all of the alternatives - deferral Approval of one of the alternatives is not necessarily dependent on writing a competing PEP. I urge people to read the "Alternative Proposals" section https://www.python.org/dev/peps/pep-0584/#id9 and consider *all* the alternatives, not just give a blanket +1 or -1 to the PEP. -- Steven
Notes on new PEP: The section on {**d1, **d2} claims "It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2]." That's 100% wrong. You're mixing up the unpacking generalizations for dict literals with the limitations on keyword arguments to functions. {**d1, **d2} is guaranteed to accept dicts with any keys, on any implementation of Python. I suspect you may actually be talking about the behavior of dict(d1, **d2), which behaved the way you described back in the Python 2 days, but that behavior has been long since disabled (in Python 3, if d2 keys are non-string, it immediately dies with a TypeError). Less critical, but still wrong, is the contention that "collections.Counter is a dict subclass that supports the + operator. There are no known examples of people having performance issues due to adding large numbers of Counters." A couple examples of Counter merge issues: https://bugs.python.org/issue36380 This is more about adding small Counters to large Counters (not a problem that would directly affect dict addition), regardless of the number of times you do it, but it gets *much* worse when combining many small Counters. https://stackoverflow.com/q/34407128/364696 Someone having the exact problem of "performance issues due to adding large numbers of Counters." On "lossiness", it says "Integer addition and concatenation are also lossy, in the sense of not being reversable: you cannot get back the two addends given only the sum. Two numbers add to give 356; what are the two numbers?" The argument in the original thread was that, for c = a + b, on all existing types in Python (modulo floating point imprecision issues), knowing any two of a, b, or c was enough to determine the value of the remaining variable; there were almost no cases (again, floating point terribleness excepted) in which there existed some value d != a for which d + b == c, where dict addition breaks that pattern, however arbitrary some people believe it to be. Only example I'm aware of where this is violated is collections.Counter, as addition strips zero values from the result, so Counter(a=0) and Counter() are equivalent in the end result of an add (which is not necessarily a good thing, see https://bugs.python.org/issue36380 , but we're stuck with it). Lastly, it seems a tad odd to deny that the Zen encourages "one way to do it" as if it were a calumny against Python invented by Perl folks (Perl folks take pride in TIMTOWTDI, but it always felt like a bit of pride in the perversity of it). Finely parsing the Zen to say it's only preferable, not a rule, is kinda missing the point of the Zen: None of it is prescriptive, it's a philosophy. Minimizing unnecessary "multiple ways to do it" to avoid kitchen sink syndrome is a reasonable goal. It's not an argument by itself if the new way to do it is strictly better, but pretending Python doesn't set a higher bar for features which already exist or are easily doable with existing tools is a little strange. Point is, if you're going to mention this in the PEP at all, justify this as something worth yet one more way to do it, don't argue that preferring one way to do it isn't a goal of Python. On Thu, Oct 17, 2019 at 5:35 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition):
https://www.python.org/dev/peps/pep-0584/
The accompanying reference implementation is on GitHub:
https://github.com/brandtbucher/cpython/tree/addiction
This new draft incorporates much of the feedback that we received during the first round of debate here on python-ideas. Most notably, the difference operators (-/-=) have been dropped from the proposal, and the implementations have been updated to use "new = self.copy(); new.update(other)" semantics, rather than "new = type(self)(); new.update(self); new.update(other)" as proposed before. It also includes more background information and summaries of major objections (with rebuttals).
Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review.
Thanks!
Brandt _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W2FCSC... Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Oct 17, 2019 at 07:48:00AM +0000, Josh Rosenberg wrote: [...]
That's 100% wrong. You're mixing up the unpacking generalizations for dict literals with the limitations on keyword arguments to functions. {**d1, **d2} is guaranteed to accept dicts with any keys, on any implementation of Python.
Do you have a canonical reference for this? If so, we should update the PEP with that information.
Less critical, but still wrong, is the contention that "collections.Counter is a dict subclass that supports the + operator. There are no known examples of people having performance issues due to adding large numbers of Counters."
A couple examples of Counter merge issues: https://bugs.python.org/issue36380
Thanks for the link, but it doesn't seem to be relevant. There's no evidence in the bug report that this was an actual performance problem in real life code, only an enhancement request to make Counter faster based on a Big Oh analysis that it was doing too much work. The analysis was also based on a different scenario: adding a small dict to a large dict, rather than adding lots and lots of dicts.
https://stackoverflow.com/q/34407128/364696 Someone having the exact problem of "performance issues due to adding large numbers of Counters."
But that's precisely what they are *not* doing: at no point do they add Counters. They are manually merging them *in place* into a single dict. We don't see how they calculate their benchmarks, so have to take their word that they are measuring what they say they are measuring. (I haven't tried to replicate their results.) At least one person in the comments questions whether the poster is actually using Counters as they claim. It is suspicious to me that they say that it freezes their GUI, which seems very odd. The poster is avoiding the quadratic behaviour feared in this proposal by merging dicts in place rather than creating lots and lots of temporary dicts, and so even if it is a genuine real-world performance problem, it is unrelated to Counter addition.
On "lossiness", it says "Integer addition and concatenation are also lossy, in the sense of not being reversable: you cannot get back the two addends given only the sum. Two numbers add to give 356; what are the two numbers?"
The argument in the original thread was that, for c = a + b, on all existing types in Python (modulo floating point imprecision issues), knowing any two of a, b, or c was enough to determine the value of the remaining variable;
*shrug* For the sake of the discussion, let's say that I accept that this was the argument. Why is this supposed "lossless" property of addition important outside of arithmetic?
there were almost no cases (again, floating point terribleness excepted) in which there existed some value d != a for which d + b == c, where dict addition breaks that pattern, however arbitrary some people believe it to be. Only example I'm aware of where this is violated is collections.Counter
So the precedent is already set by Counter. I know that Raymond has come out in the past as being against this proposal, but ironically Counter's API is his design, including the use of plus. So far as I know, Raymond hasn't said that the Counter plus API is a mistake or that he regrets it. It's too late for 3.8, but this proposal for time addition (another much requested feature) is relevant: https://bugs.python.org/issue17267 Given 24 hour wrap-around semantics, time addition will also violate this "addition isn't lossy" principle. As any modulo arithmetic will also do, including fixed-size ints with wrap-around on overflow semantics. It is also violated by float (and Decimal), which you dismiss by calling it "terribleness". Terribleness or not, it exists, and so it is not true that numeric addition is necessarily reversable in Python. Rather, it is true that numeric addition for builtins is already "lossy". -- Steven
Late to the game, I like the proposal a lot but have a couple of questions about it. Why the need for strictness of type for the operator? I get that it's analogous with the behavior for `list`, but I guess I'm also not sure why that should be strict. As a compromise in regard to strictness, what it anything that is a `Mapping` or has `.asdict()` could be added using the operator but only those kinds of thing? Also, why was subtraction dropped? It seems to me the ability to subtract an iterable of keys makes a lot of sense — or maybe that should be a separate PEP?
On Thu, Oct 17, 2019 at 08:19:13AM -0000, Steve Jorgensen wrote:
Why the need for strictness of type for the operator? I get that it's analogous with the behavior for `list`, but I guess I'm also not sure why that should be strict.
(1) Follow the precedent of existing operators. (2) It is easier to relax restrictions later, than to add restrictions if the original behaviour turned out to be a mistake. So it is (usually) better to begin with the most conservative thing that will work and gradually allow more if and when needed. (It is relatively easy to add functionality to the language, but very difficult to take it away.)
Also, why was subtraction dropped? It seems to me the ability to subtract an iterable of keys makes a lot of sense — or maybe that should be a separate PEP?
In the initial discussion, the subtraction operator got very little attention. It isn't really fair to sneak in a second operator as part of a controversial proposal like this one. If someone wishes to re-propose the subtraction operator, or the full suite of set-like operations (intersection, difference, symmetric difference, union) they are free to do so, either as an adjunct to, or a competitor of, this PEP. -- Steven
Steven D'Aprano wrote:
On Thu, Oct 17, 2019 at 08:19:13AM -0000, Steve Jorgensen wrote:
Why the need for strictness of type for the operator? I get that it's analogous with the behavior for list, but I guess I'm also not sure why that should be strict. (1) Follow the precedent of existing operators. (2) It is easier to relax restrictions later, than to add restrictions if the original behaviour turned out to be a mistake. So it is (usually) better to begin with the most conservative thing that will work and gradually allow more if and when needed. (It is relatively easy to add functionality to the language, but very difficult to take it away.) Also, why was subtraction dropped? It seems to me the ability to subtract an iterable of keys makes a lot of sense — or maybe that should be a separate PEP? In the initial discussion, the subtraction operator got very little attention. It isn't really fair to sneak in a second operator as part of a controversial proposal like this one. If someone wishes to re-propose the subtraction operator, or the full suite of set-like operations (intersection, difference, symmetric difference, union) they are free to do so, either as an adjunct to, or a competitor of, this PEP.
Thanks. Those explanations make sense to me. :)
On Thu, Oct 17, 2019 at 5:33 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Oct 17, 2019 at 08:19:13AM -0000, Steve Jorgensen wrote:
Why the need for strictness of type for the operator? I get that it's analogous with the behavior for `list`, but I guess I'm also not sure why that should be strict.
(1) Follow the precedent of existing operators.
(2) It is easier to relax restrictions later, than to add restrictions if the original behaviour turned out to be a mistake
And it's a good idea :-) My first thought was why? I like Dynamic tuypoing, why not except any mapping? Then I relized that when you write: a_list + a_tuple do you want a list or a tuple back? Yes, you could be clear about the precedence, but maybe that wouldn't be obvious to everuyone. So better to require people to be explicit: a_list + list(a_tuple) or some such. Same applies to Mappings. ANd one more note: " Open questions Should these operators be part of the ABC Mapping API? ' Absolutely! having the built in mapping NOT be the same as the Mapping API seems like a really bad idea. The whole point of having all those ABCs is so people can write type-independent code. If I write a function that requires a Mapping, and I only test it with dicts, and I use + -- then it will fail when used with another Mapping object. I guess in short -- we should not have key ABCs with no implementation! -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Thu, Oct 17, 2019 at 06:07:51PM -0400, Christopher Barker wrote:
" Open questions Should these operators be part of the ABC Mapping API? '
Absolutely! having the built in mapping NOT be the same as the Mapping API seems like a really bad idea.
I'm not so sure about that, which is why it's an open question. The PEP proposes a much more limited change: the concrete class dict to support a merge operator + or maybe | rather than making any proposals for other mappings and the abstract Mapping API. The Mapping API provides: in equality subscripting (getitem) get keys/items/values len iter Notice that there is no `update` API. Dicts currently have at least eight APIs which aren't part of the Mapping API: update pop and popitem setdefault copy clear delitem and setitem The key point of abstract base classes is that they provide the *minimum* interface you can expect from a Mapping. This PEP does not require all mapping types to support a merge operator. -- Steven
On Oct 20, 2019, at 17:34, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Oct 17, 2019 at 06:07:51PM -0400, Christopher Barker wrote:
" Open questions Should these operators be part of the ABC Mapping API? '
Absolutely! having the built in mapping NOT be the same as the Mapping API seems like a really bad idea.
I'm not so sure about that, which is why it's an open question. The PEP proposes a much more limited change: the concrete class dict to support a merge operator + or maybe | rather than making any proposals for other mappings and the abstract Mapping API.
The Mapping API provides:
in equality subscripting (getitem) get keys/items/values len iter
Notice that there is no `update` API.
That’s because there are separate Mapping and MutableMapping types, just as there are for sequence and set, and update is obviously in MutableMapping.
Dicts currently have at least eight APIs which aren't part of the Mapping API:
update pop and popitem setdefault copy clear delitem and setitem
And all of these but copy are mutating methods, so they would be wrong in Mapping. And all seven of them are in MutableMapping. So yes, Christopher is wrong to suggest that the new operators should be part of Mapping, but only because he should have said (and presumably meant) MutableMapping. The rest of his argument applies once you change that.
The key point of abstract base classes is that they provide the *minimum* interface you can expect from a Mapping.
That’s not true. They’re also mixins that implement the complete interface for you if you just implement the minimal one. And this is a very useful feature. Even figuring out exactly which methods you need to implement to be “like a tuple” or “like a dict” or, worst of all, “like a text file” used to be a pain, much less implementing them all. Nowadays, you just inherit from Sequence or MutableMapping or TextIOBase, implement the small set of methods that it requires (which are checked for you at class definition time in case you remember wrong or typo one), and you’re done. Of course the ABCs are not perfect, because nobody actually had a list of “all the operations people will expect from any sequence, mutable mapping, text file, etc.” before the ABCs were added, so someone had to make that call. But a decade on since that call has been made, I think people’s expectations have adapted to the ABCs, so it’s no longer a problem—except where we’re talking about adding new functionality, where someone has to make the call again, whether this new functionality is only part of dicts or is part of all mappings. So I think the PEP needs to decide one way or the other and argue for it, not just dismiss the question. Given that MutableSequence and MutableSet do include __iadd__ and __ior__ respectively, I think the default should obviously be to do the same for MutableMapping if we’re adding one of those methods to dict. As for adding __add__ or __or__ to Mapping, that’s less obvious, because Set.__or__ exists, but Sequence.__add__ does not. I think if you end up using __or__ I’d argue for it to be added to Mapping, but if you end up using __add__… well, I don’t know why Sequence.__add__ wasn’t included, so I don’t know whether the same rationale applies to Mapping.__add__ or not.
On Sun, Oct 20, 2019 at 9:27 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Given that MutableSequence and MutableSet do include __iadd__ and __ior__ respectively, I think the default should obviously be to do the same for MutableMapping if we’re adding one of those methods to dict.
As for adding __add__ or __or__ to Mapping, that’s less obvious, because Set.__or__ exists, but Sequence.__add__ does not. I think if you end up using __or__ I’d argue for it to be added to Mapping, but if you end up using __add__… well, I don’t know why Sequence.__add__ wasn’t included, so I don’t know whether the same rationale applies to Mapping.__add__ or not.
I was there, but I can't recall the details! It looks like the Set and MutableSet APIs are focused on the overloaded operators, while the [Mutable]Sequence APIs are focused on methods. MutableSequence throws in __iadd__ because it's just a wrapper for .extend(), but the same can't be said for Sequence and __add__ -- in fact there's nothing else in Sequence that constructs a new Sequence (whereas Set has a special API for that, Set._from_iterable()). Mapping is more like Sequence in this sense. Another concern is that we can't add an abstract method to an ABC without creating a backward incompatibility; a class inheriting from Mapping would start failing to instantiate if we were to add Mapping.__add__. But we could implement MutableMapping.__ior__ in the ABC by making it call .update(), just like MutableSequence.__iadd__ calls .extend(). There would still be a lesser problem for classes that use MutableMapping.register() -- if they don't implement __ior__ they don't technically implement the protocol represented by the ABC. But that would only appear if *other* code started calling __ior__. This is more of a gray area. (And I would generally advise against .register() except in cases where one has no control over the source code.) Note that we can still support Mapping | dict and dict | Mapping, by having __or__ and __ror__ check for Mapping (somehow). Also note that copy() methods are entirely missing from the ABCs. For [Mutable]Sequence and [Mutable]Mapping this is not a coincidence -- there are no methods that create new instances. For [Mutable]Set I'm not sure about the reason -- perhaps it's in analogy of the others, perhaps it's because you can easily create a copy of a set using s | set(). PS: A curious thing (unrelated to the above points): dict.update() preserves the keys but updates the values (this actually follows from the behavior of __setitem__). This becomes clear when the keys compare equal but have different representations (e.g. 1 == 1.0). The | and |= operator should follow suit. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Oct 20, 2019, at 22:08, Guido van Rossum <guido@python.org> wrote:
Also note that copy() methods are entirely missing from the ABCs. For [Mutable]Sequence and [Mutable]Mapping this is not a coincidence -- there are no methods that create new instances. For [Mutable]Set I'm not sure about the reason -- perhaps it's in analogy of the others, perhaps it's because you can easily create a copy of a set using s | set().
I always thought this went along with the fact that the constructor isn’t part of the ABC. For something that duck types as a tuple or list or set all the way up to constructing new instances, you can just do type(s)(iter(a)) with the same meaning as s.copy(), but you can’t do that for, say, range, where you need to understand more about the type than its sequenceness to construct an equal one. So there’s no way the mixin can help; if the ABC required any methods that returned new instances, every class would have to define all of them. Another point against Mapping.__add__ (but not MutableMapping.__iadd__): the performance cost of copy-merging is acceptable if you know something about them beyond the fact that they’re mappings (like when you’re adding a tiny literal to a **kw, or you know you’re using persistent sharing HAMTs), but if you don’t, it may not be. You could discourage people from writing sum(some arbitrary possibly huge mappings) with a check inside sum, but it’s hard to discourage people from doing the same thing indirectly.
So yes, Christopher is wrong to suggest that the new operators should be
From: Andrew Barnert via Python-ideas <python-ideas@python.org> part of Mapping, but only because he should have said (and presumably meant) MutableMapping. Indeed I did, thanks. I wasn't really thinking about it, because there isn't a immutable Mapping type builtin at all. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I think this PEP is very relating to language design philosophy. (a) Overload operator heavily for convenience. (b) Prefer methods over operators. Set a high bar for overloading operators on core types. I prefer (b) philosophy. And I don't think described usefulness is enough for adding the operator. I know this is a subjective opinion, but I'm -1 on this PEP. Regards, On Thu, Oct 17, 2019 at 2:37 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition):
https://www.python.org/dev/peps/pep-0584/
The accompanying reference implementation is on GitHub:
https://github.com/brandtbucher/cpython/tree/addiction
This new draft incorporates much of the feedback that we received during the first round of debate here on python-ideas. Most notably, the difference operators (-/-=) have been dropped from the proposal, and the implementations have been updated to use "new = self.copy(); new.update(other)" semantics, rather than "new = type(self)(); new.update(self); new.update(other)" as proposed before. It also includes more background information and summaries of major objections (with rebuttals).
Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review.
Thanks!
Brandt _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W2FCSC... Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
On Thu, Oct 17, 2019 at 8:54 PM Inada Naoki <songofacandy@gmail.com> wrote:
I think this PEP is very relating to language design philosophy.
(a) Overload operator heavily for convenience.
(b) Prefer methods over operators. Set a high bar for overloading operators on core types.
I prefer (b) philosophy. And I don't think described usefulness is enough for adding the operator.
I know this is a subjective opinion, but I'm -1 on this PEP.
Equally subjective, of course, but I prefer to be able to use operators. There's a good reason that we use mathematical symbols for non-numeric data types already (adding tuples together, multiplying a list by an integer, etc), and it definitely makes code easier to read. +1. ChrisA
On 17 Oct 2019, at 13:26, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 17, 2019 at 8:54 PM Inada Naoki <songofacandy@gmail.com> wrote:
I think this PEP is very relating to language design philosophy.
(a) Overload operator heavily for convenience.
(b) Prefer methods over operators. Set a high bar for overloading operators on core types.
I prefer (b) philosophy. And I don't think described usefulness is enough for adding the operator.
I know this is a subjective opinion, but I'm -1 on this PEP.
Equally subjective, of course, but I prefer to be able to use operators. There's a good reason that we use mathematical symbols for non-numeric data types already (adding tuples together, multiplying a list by an integer, etc), and it definitely makes code easier to read.
The multiply a list/string thing is a big mistake imo. You almost never use it so the cost of not having it is almost nothing while the cost of having type errors propagate beyond the point where they happened is big. Let's not remake the mistakes of the past. Or at least let's not use those mistakes as positive examples. / Anders
On Thu, Oct 17, 2019 at 10:42 PM Anders Hovmöller <boxed@killingar.net> wrote:
On 17 Oct 2019, at 13:26, Chris Angelico <rosuav@gmail.com> wrote:
Equally subjective, of course, but I prefer to be able to use operators. There's a good reason that we use mathematical symbols for non-numeric data types already (adding tuples together, multiplying a list by an integer, etc), and it definitely makes code easier to read.
The multiply a list/string thing is a big mistake imo. You almost never use it so the cost of not having it is almost nothing while the cost of having type errors propagate beyond the point where they happened is big.
Actually, I use it often enough to be useful. Probably more often than numeric exponentiation, and I'm sure you'd agree that 3**7 is better than requiring math.pow(3,7). I don't understand your point about type errors; how does "="*79 cause type errors? Or initializing a list of zeroes with [0]*20 ? And believe you me, the number of times I have *wished* for features like these when working in other languages (mainly SourcePawn) makes me definitely not see them as mistakes. ChrisA
On 17 Oct 2019, at 13:46, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 17, 2019 at 10:42 PM Anders Hovmöller <boxed@killingar.net> wrote:
On 17 Oct 2019, at 13:26, Chris Angelico <rosuav@gmail.com> wrote:
Equally subjective, of course, but I prefer to be able to use operators. There's a good reason that we use mathematical symbols for non-numeric data types already (adding tuples together, multiplying a list by an integer, etc), and it definitely makes code easier to read.
The multiply a list/string thing is a big mistake imo. You almost never use it so the cost of not having it is almost nothing while the cost of having type errors propagate beyond the point where they happened is big.
Actually, I use it often enough to be useful. Probably more often than numeric exponentiation, and I'm sure you'd agree that 3**7 is better than requiring math.pow(3,7). I don't understand your point about type errors; how does "="*79 cause type errors? Or initializing a list of zeroes with [0]*20 ?
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing a*b is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not. That's a logical type error that now propagates and it's very hard to track down the offending line when you eventually end up with a crash in a different module because
baz = foo / bar Traceback (most recent call last): File "<string>", line 1, in <module> TypeError: unsupported operand type(s) for /: 'str' and 'int'
And believe you me, the number of times I have *wished* for features like these when working in other languages (mainly SourcePawn) makes me definitely not see them as mistakes.
I strongly disagree. I have used it but I always feel dirty for having written code that is clever and not explicit. There just isn't a nice explicit way to do it in python so that's what I'm left with. The utility of the feature is not something I dispute. It is useful! But it would be just as useful as a method. /Anders
On Thu, Oct 17, 2019 at 11:08 PM Anders Hovmöller <boxed@killingar.net> wrote:
So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If you start with the assumption that multiplication implies numbers, and then you see multiplication and it's not numbers, then yes, you may have problems. But why start with that assumption? Why, when you look at multiplication, should you therefore think that a and b are numbers? Instead, start with the assumption that MANY things can be added, multiplied, etc. Then it's not a logical type error to multiply strings, add dictionaries (if this proposal goes through), subtract timestamps, etc. It's just part of coding. And if you REALLY want early detection of logical type errors, use a type checker. ChrisA
On 17 Oct 2019, at 14:26, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 17, 2019 at 11:08 PM Anders Hovmöller <boxed@killingar.net> wrote:
So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If you start with the assumption that multiplication implies numbers, and then you see multiplication and it's not numbers, then yes, you may have problems. But why start with that assumption? Why, when you look at multiplication, should you therefore think that a and b are numbers?
Instead, start with the assumption that MANY things can be added, multiplied, etc. Then it's not a logical type error to multiply strings, add dictionaries (if this proposal goes through), subtract timestamps, etc. It's just part of coding.
I don't agree. I also think + for string concat was a mistake. It's an unnecessary source of errors.
And if you REALLY want early detection of logical type errors, use a type checker.
I would like to point out that this is wrong on many levels. The most obvious one is that this is just not how python works normally. Python is dynamically strongly typed. Not dynamically weakly typed. The vast majority of logical type errors are indeed caught when the type error occurs. This is a good thing. We don't want to be PHP or Javascript. Another flaw with this logic is that this is user hostile. We can, do and should try to catch mistakes early. Another flaw is that this argument might sort of work now but didn't work before mypy et al existed. / Anders
On 17/10/2019 14:59:13, Anders Hovmöller wrote:
On 17 Oct 2019, at 14:26, Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Oct 17, 2019 at 11:08 PM Anders Hovmöller <boxed@killingar.net> wrote:
So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If you start with the assumption that multiplication implies numbers, and then you see multiplication and it's not numbers, then yes, you may have problems. But why start with that assumption? Why, when you look at multiplication, should you therefore think that a and b are numbers?
Instead, start with the assumption that MANY things can be added, multiplied, etc. Then it's not a logical type error to multiply strings, add dictionaries (if this proposal goes through), subtract timestamps, etc. It's just part of coding.
I don't agree. I also think + for string concat was a mistake. It's an unnecessary source of errors. I routinely use + to concatenate a few strings (yes I'm aware of the
Not infrequently I use AString*AInteger or AList*AInteger. Isn't print '-' * 80 better and clearer than print '--------------------------------------------------------------------------------' # Please count these to check I got it right:-) And of course there are many other use cases. performance issue adding many strings). It's concise, clear and convenient. When I can write (toy invented example) body = prefix + text + suffix why should I be forced to type some more verbose and less clear boilerplate like body = '%s%s%s' % (prefix, text, suffix) or body = ''.join((prefix, text, suffix)) # Actually I missed out a closing bracket before I tested this! I rest my case. or body = f'{prefix}{text}{suffix}' or [suggestions invited] I would not wish to see either feature removed from Python. I'm sure I have sometimes added a number to a string. But the TypeError message soon puts me right. Rob Cliffe.
And if you REALLY want early detection of logical type errors, use a type checker. I would like to point out that this is wrong on many levels. The most obvious one is that this is just not how python works normally. Python is dynamically strongly typed. Not dynamically weakly typed. The vast majority of logical type errors are indeed caught when the type error occurs. This is a good thing. We don't want to be PHP or Javascript.
Another flaw with this logic is that this is user hostile. We can, do and should try to catch mistakes early.
Another flaw is that this argument might sort of work now but didn't work before mypy et al existed.
/ Anders _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KKBMWM... Code of Conduct: http://python.org/psf/codeofconduct/
On Oct 17, 2019, at 05:08, Anders Hovmöller <boxed@killingar.net> wrote:
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If neither one is a number—in fact, if b is not an integer—you will get a TypeError. Also, the reason you have no idea what’s in these variables is that you named them a and b instead of something meaningful. Normally if I have variables named a and b they’re going to be numbers (or maybe numpy arrays), because those are good names for two triangle side lengths but they’re very bad names for a list of strings and an integer Unix timestamp (to give an example which would allow * but it would be a logical error in your code that Python can’t catch). At any rate, I definitely like using operators for concatenation and merging and elementwise array addition and so on. When I use other languages, I definitely run into annoyance with not having operator overloading in Java or JS much more often than I run into annoyance with abuse if operator overloading in C++ or Haskell. It’s not that it can’t happen, or even that it never happens in practice, it just happens a lot less often than getting lost trying to figure out what a simple array expression does because it had to be written in prefix form with long names and parens all over the place. From a theoretical point of view, I’m sympathetic to the idea that they should all use _different_ operators from addition. But every time I go back to some Haskell code and have to look up the 48 different operators (all spelled as strings of punctuation characters) before I can understand what I wrote, and then I go back to some Python or C++ code and I already know that + means some kind of addition or concatenation no matter what the custom types are (and usually there’s only one kind that makes sense for the values), my theoretical sympathy is overridden by practical annoyance. At any rate, the PEP is not proposing multiplication for repeated dict merge, and even the potential “add all the set operators” second PEP mentioned near the end is presumably not going to propose that, so I don’t think there’s much to gain by arguing about whether * for sequences was a mistake. If you want to propose adding a `repeat` method to the sequence types and deprecating or at least lightly discouraging the use of __mul__, that seems completely unrelated to this PEP.
On 17 Oct 2019, at 17:07, Andrew Barnert <abarnert@yahoo.com> wrote:
On Oct 17, 2019, at 05:08, Anders Hovmöller <boxed@killingar.net> wrote:
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If neither one is a number—in fact, if b is not an integer—you will get a TypeError.
Also, the reason you have no idea what’s in these variables is that you named them a and b instead of something meaningful.
No. The reason I don't know is because this is a hypothetical example. 🙄 In real code I would "know" BUT BE WRONG because the variable names would be outright lying. / Anders
On Fri, Oct 18, 2019 at 5:25 AM Anders Hovmöller <boxed@killingar.net> wrote:
On 17 Oct 2019, at 17:07, Andrew Barnert <abarnert@yahoo.com> wrote:
On Oct 17, 2019, at 05:08, Anders Hovmöller <boxed@killingar.net> wrote:
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If neither one is a number—in fact, if b is not an integer—you will get a TypeError.
Also, the reason you have no idea what’s in these variables is that you named them a and b instead of something meaningful.
No. The reason I don't know is because this is a hypothetical example. In real code I would "know" BUT BE WRONG because the variable names would be outright lying.
/ Anders
So if you had 'separator' and 'width', would the variable names be outright lying, or would it then be reasonable to multiply a separator character by a width (eg 80) to create a line? ChrisA
On 17 Oct 2019, at 20:37, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Oct 18, 2019 at 5:25 AM Anders Hovmöller <boxed@killingar.net> wrote:
On 17 Oct 2019, at 17:07, Andrew Barnert <abarnert@yahoo.com> wrote:
On Oct 17, 2019, at 05:08, Anders Hovmöller <boxed@killingar.net> wrote:
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing
a*b
is the real question. And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
If neither one is a number—in fact, if b is not an integer—you will get a TypeError.
Also, the reason you have no idea what’s in these variables is that you named them a and b instead of something meaningful.
No. The reason I don't know is because this is a hypothetical example. In real code I would "know" BUT BE WRONG because the variable names would be outright lying.
/ Anders
So if you had 'separator' and 'width', would the variable names be outright lying, or would it then be reasonable to multiply a separator character by a width (eg 80) to create a line?
Eh. No. What? Are you really being sincere? In any case this would be fine: line = separator.fill(80) (although if we're attacking each other's variable names how about "section_separator = separator character.fill(80)"?) / Anders
On Fri, Oct 18, 2019 at 6:06 AM Anders Hovmöller <boxed@killingar.net> wrote:
On 17 Oct 2019, at 20:37, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Oct 18, 2019 at 5:25 AM Anders Hovmöller <boxed@killingar.net> wrote:
No. The reason I don't know is because this is a hypothetical example. In real code I would "know" BUT BE WRONG because the variable names would be outright lying.
/ Anders
So if you had 'separator' and 'width', would the variable names be outright lying, or would it then be reasonable to multiply a separator character by a width (eg 80) to create a line?
Eh. No. What? Are you really being sincere?
In any case this would be fine:
line = separator.fill(80)
Okay, so clearly we have quite different ideas about what's acceptable. That's fine. You're allowed to use a method, but since the language DOES support multiplication of string by repeat count, I'll quite happily continue to use the operator. ChrisA
On Thu, Oct 17, 2019 at 02:08:56PM +0200, Anders Hovmöller wrote:
Well obviously never with literals. But most cases of multiplication aren't with literals. So how can you get a type error when doing
a*b
is the real question.
Actually, the real question is, why are you using such clearly unsuitable and non-descriptive variable names and blaming the confusion caused by poor names on the syntax? Borrowing from the PEP, how would you get a type-error from these? width + margin prefix + word If you did, the cause would clearly be a bug in your code, not the fault of the syntax. In my experience, the most common use for the repetition operator in strings involves a single, literal, character: dashes = '-'*count
And the answer is now obvious: any time the programmer thinks a and b are numbers but they are not.
And this is a fault of the syntax, how? If repetition was spelled: a.repeat(b) and the programmer thought a was a string, when it was actually a float, and thought b was an int, but it was actually None, would you conclude that "method calls are a mistake" because they cause type errors? I'm sure you would agree with me that this is a bogus argument. Type errors are a sign of a bug in the code regardless of the spelling: you think a value has a different type than it actually has't. If it is a bogus argument for method calls, it is a bogus argument for operators.
That's a logical type error that now propagates and it's very hard to track down the offending line when you eventually end up with a crash in a different module because
And that's an argument against any form of type polymorphism. I expected a list, but got a string, and len(a) didn't raise TypeError therefore "functions are a mistake". -- Steven
On Thu, Oct 17, 2019 at 01:42:54PM +0200, Anders Hovmöller wrote:
The multiply a list/string thing is a big mistake imo. You almost never use it so the cost of not having it is almost nothing while the cost of having type errors propagate beyond the point where they happened is big.
I use string repetition and concatenation frequently. For example, for indenting a string I much prefer this: string = " "*n + string over something like string = ''.join([''.join(' ' for __ in range(n)), string]) List repetition, not so often, but I do use list concatenation and even more so tuple concatenation. So I disagree that they are "almost never" used, or that they are a mistake. If we didn't have string `+` and `*`, and list `+`, I would certainly have to roll my own as functions. -- Steven
On 17 Oct 2019, at 22:44, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Oct 17, 2019 at 01:42:54PM +0200, Anders Hovmöller wrote:
The multiply a list/string thing is a big mistake imo. You almost never use it so the cost of not having it is almost nothing while the cost of having type errors propagate beyond the point where they happened is big.
I use string repetition and concatenation frequently. For example, for indenting a string I much prefer this:
string = " "*n + string
over something like
string = ''.join([''.join(' ' for __ in range(n)), string])
Yes that's terrible. But `" ".fill(n) + string"` would be just as good but without being a source of bugs that pollutes downstream execution.
List repetition, not so often, but I do use list concatenation and even more so tuple concatenation.
Which are fine because the types at least have to match. It would have been better with a dedicated concat operator I think but just a little bit better.
Anders Hovmöller wrote:
It would have been better with a dedicated concat operator I think but just a little bit better.
Interesting little side note, for anyone who doesn't already know: at the C level, Python objects have two ways of implementing addition and multiplication. The first is through the "numeric" `tp_as_number` interface, which exposes our well-known `__add__` / `__mul__` trios of methods. The second is through the `tp_as_sequence` interface, where we can define "concat" / "inplace concat" / "repeat" / "inplace repeat" functions to do the same thing (at least it's mostly the same when working in the Python layer). Sequence repetition and concatenation are very strongly baked into the Python's object model, even if it's not immediately obvious from further up. The C implementation for this proposal uses the latter interface, for a few technical reasons that I won't dive into here. This means, though, that it really can be accurately thought of as "dict concatenation".
+1 Been waiting a decade+ for this. Too bad it didn't get into 3.8. Nice work on the PEP. -Mike p.s. A another mentioned this existing syntax does allow non-string keys:
d1 = dict(one=1, two=2)
d2 = {'three':3, 4:4}
{**d1, **d2}
{'one': 1, 'two': 2, 'three': 3, 4: 4} On 2019-10-16 22:35, Brandt Bucher wrote:
On Thu, Oct 17, 2019 at 7:35 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review.
I, personally, do not feel a need to have an operator alternative to an 'update' function, because I believe the 'update' function is pretty self-explanatory: It has certain "cognitive property" (i.e. one can conclude what type of objects it applies to), it also explicitly identifies the way the operation applies. But if someone needs an operator for that, I propose to introduce a different symbol, while considering some similarities in the (Python) language already. On the abstract level, the operation described in the PEP looks more like set union (on dict keys) than anything else using concatenation (not considering "arithmetic" meaning of '+' here). The problem with union is that it is commutative, while the proposed dict '+' is not, so let's imagine a new operator '|<' (and optionally '|>', or '|<=' for the '+=' version) and use it like this ``` a = dict(...) b = dict(...) c = a |< b ``` The consequences (and the differences to the original proposal): 1) The operator (visually) looks like set union but not quite and is self-explanatory. 2) The operator is unique enough it could suggest that 'a' and 'b' are dicts (the same way 'update' does and not the other way around - the object type being used to deduce the operator meaning). 2) Anyone finding this in the code, who does not now about the operator, will _not_ guess the wrong operation. I know this has been somewhat addressed in PEP, but I simply cannot see, how adding another ambiguity to the '+' symbol, can be better, because the "familiarity" of '+' (which seems to me being an argument in the PEP) just hides the fundamental differences between this '+' and the other ones (arithmetic, or concatenation). Richard
On Fri, Oct 18, 2019 at 11:41 PM Richard Musil <risa2000x@gmail.com> wrote:
I know this has been somewhat addressed in PEP, but I simply cannot see, how adding another ambiguity to the '+' symbol, can be better, because the "familiarity" of '+' (which seems to me being an argument in the PEP) just hides the fundamental differences between this '+' and the other ones (arithmetic, or concatenation).
Adding a time delta to a datetime isn't quite the same as adding two numbers. Adding two strings is even more different. Adding two tuples, different again. Yet they are all "adding" in a logical way. Checking if an integer is in a list isn't the same as checking if one is in a dict, and checking if a substring is in a larger string is even more different, yet, again, they are all testing for "containment". This is polymorphism. This is the normal way that Python works. What "fundamental differences" are there to be hidden? Are they not simply the nature of addition as defined by a dictionary? This is not adding ambiguity to the '+' symbol. It *always* means addition (or unary plus, but you can recognize that by the lack of left operand). ChrisA
On Fri, Oct 18, 2019 at 2:49 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Oct 18, 2019 at 11:41 PM Richard Musil <risa2000x@gmail.com> wrote:
I know this has been somewhat addressed in PEP, but I simply cannot see,
how adding another ambiguity to the '+' symbol, can be better, because the "familiarity" of '+' (which seems to me being an argument in the PEP) just hides the fundamental differences between this '+' and the other ones (arithmetic, or concatenation).
Adding a time delta to a datetime isn't quite the same as adding two numbers. Adding two strings is even more different. Adding two tuples, different again. Yet they are all "adding" in a logical way. Checking if an integer is in a list isn't the same as checking if one is in a dict, and checking if a substring is in a larger string is even more different, yet, again, they are all testing for "containment". This is polymorphism. This is the normal way that Python works. What "fundamental differences" are there to be hidden? Are they not simply the nature of addition as defined by a dictionary?
I have no problem with any example you mentioned above, yet I consider proposed meaning for '+' symbol in this PEP context fundamentally different from either "arithmetic" or "concatenation" meaning, used in the previous examples, because it does not only adds, but also changes the values. Technically we can call set union also an "add" operation, yet we use different semantics to express it, for exactly the same reason, it is not "arithmetic" nor, "concatenation", but something different. Richard
On Fri, 18 Oct 2019 at 14:33, Richard Musil <risa2000x@gmail.com> wrote:
I have no problem with any example you mentioned above, yet I consider proposed meaning for '+' symbol in this PEP context fundamentally different from either "arithmetic" or "concatenation" meaning, used in the previous examples, because it does not only adds, but also changes the values. Technically we can call set union also an "add" operation, yet we use different semantics to express it, for exactly the same reason, it is not "arithmetic" nor, "concatenation", but something different.
IMO, debating the "meaning" of addition, and whether the + operator is appropriate here, is not the key question. The real questions for me are whether the update operation is used frequently enough to require an additional way of spelling it, and whether using the + operator leads to cleaner more readable code. I've personally almost never needed the dictionary update operator, so for me, having a second way of spelling it is probably not going to be of any real benefit (and may be a net disadvantage, because I'll have to understand both forms in code I support). I don't really have a view on whether the operator or the method would result in cleaner code, but I'd expect that given how rarely I use the operation, a named method would be easier to follow (for me). All of the above is just my subjective opinion. Objective information is likely to only be available by someone looking at a body of real world code and demonstrating what difference the proposed operator would make to it (maybe someone did that already - I've not been following this thread closely). Paul
Paul Moore wrote:
All of the above is just my subjective opinion. Objective information is likely to only be available by someone looking at a body of real world code and demonstrating what difference the proposed operator would make to it (maybe someone did that already - I've not been following this thread closely).
Basically the whole bottom quarter of the PEP! https://www.python.org/dev/peps/pep-0584/#examples-of-candidates-for-the-dic...
On Fri, 18 Oct 2019 at 14:57, Brandt Bucher <brandtbucher@gmail.com> wrote:
Paul Moore wrote:
All of the above is just my subjective opinion. Objective information is likely to only be available by someone looking at a body of real world code and demonstrating what difference the proposed operator would make to it (maybe someone did that already - I've not been following this thread closely).
Basically the whole bottom quarter of the PEP!
Thanks! I told you I hadn't been following closely ;-) Of those I find very few look better with the + operator. There are about 20 examples, and at most I'd say 4 were an improvement. A couple were arguable, but most were in my view worse when rewritten. And I worry that some of the ones that were an improvement might be confusing if I hadn't just been reading about the + operator for dictionaries (or to put it another way, they make a virtue of conciseness, which is less of a virtue in code that you're not as familiar with). Again, I note this is purely subjective opinion. Paul
On 18 Oct 2019, at 15:58, Brandt Bucher <brandtbucher@gmail.com> wrote:
Paul Moore wrote:
All of the above is just my subjective opinion. Objective information is likely to only be available by someone looking at a body of real world code and demonstrating what difference the proposed operator would make to it (maybe someone did that already - I've not been following this thread closely).
Basically the whole bottom quarter of the PEP!
https://www.python.org/dev/peps/pep-0584/#examples-of-candidates-for-the-dic...
The examples would make more sense if the "before" picture was using modern syntax fully. The alternative to c = a + b isn't the old c = {} c.update(a) c.update(b) but in fact: c = {**a, **b} I agree that the proposed + overload is nicer to read but the PEP isn't being fair to the current syntax imo. / Anders
On 18/10/2019 15:43, Anders Hovmöller wrote:
On 18 Oct 2019, at 15:58, Brandt Bucher <brandtbucher@gmail.com> wrote: Basically the whole bottom quarter of the PEP!
https://www.python.org/dev/peps/pep-0584/#examples-of-candidates-for-the-dic...
The examples would make more sense if the "before" picture was using modern syntax fully. The alternative to
c = a + b
isn't the old
c = {} c.update(a) c.update(b)
but in fact:
c = {**a, **b}
I agree that the proposed + overload is nicer to read but the PEP isn't being fair to the current syntax imo.
Since I think the {**a, **b} syntax is *horrible*, I think the PEP is being more than fair! a + b is much easier to read and intuit than {**a, **b}. The sequence of methods is clearer in meaning but more cumbersome. All in all I think the operator is useful enough to justify adding it. -- Rhodri James *-* Kynesim Ltd
On Oct 18, 2019, at 06:49, Paul Moore <p.f.moore@gmail.com> wrote:
IMO, debating the "meaning" of addition, and whether the + operator is appropriate here, is not the key question. The real questions for me are whether the update operation is used frequently enough to require an additional way of spelling it, and whether using the + operator leads to cleaner more readable code.
I don’t think it’s about the mutating update operation; I think everyone can live with the update method there. The problem is the copying update. The only way to spell it is to store a copy in a temporary variable and then update that. Which you can’t do in an expression. You can do _almost_ the same thing with {**a, **b}, but not only is this ugly and hard to discover, it also gives you a dict even if a was some other mapping type, so it’s making your code more fragile, and usually not even doing so intentionally. I have definitely wanted this operation myself. The fact that even people who don’t like this proposal think it would be helpful in a quarter of the 20 places given where it could be used sounds like an argument for it. (As a side note, I think the PEP might be more effective if it identified good uses, rather than identifying a bunch of uses good and bad and leaving it to the reader to decide which was which, but that’s not a problem with the value of the proposal, at worst it’s a “strategic” problem with the PEP.) If we accept that this is useful, then there’s the question of whether it needs to be an operator, or a method, or a builtin function, or a function in collections. And then, if it’s an operator, the question of whether that should be + or | or something else. And I think mutating update just goes along for the ride: if we end up adding + in analogy with list concatenation, then obviously we want += as well; if we add |, add |= as well; it we add a merge method, we’ve already got an update method so nothing changes there; etc Anyway, it sounds like the only arguments against it are arguments that Python shouldn’t have polymorphic operator overloading in the first place. Honestly, if you believe that, I don’t know why you’d like Python in the first place. It’s embedded so deeply in the design of the language that most of the “data model” chapter of the reference is about it; it’s a big part of what makes Python “duck typed”, and why we have NumPy and SymPy while other dynamic languages can’t, and so on.
On Fri, Oct 18, 2019 at 09:17:53AM -0700, Andrew Barnert via Python-ideas wrote:
I don’t think it’s about the mutating update operation; I think everyone can live with the update method there. [...] And I think mutating update just goes along for the ride: if we end up adding + in analogy with list concatenation, then obviously we want += as well; if we add |, add |= as well; it we add a merge method, we’ve already got an update method so nothing changes there; etc
Indeed! It's not even that we *want* += as well as + but that the interpreter gives it to us for free, whether we want it or not. -- Steven
On Oct 18, 2019, at 06:31, Richard Musil <risa2000x@gmail.com> wrote:
Technically we can call set union also an "add" operation, yet we use different semantics to express it, for exactly the same reason, it is not "arithmetic" nor, "concatenation", but something different.
Set union had analogies to two different operations that pre-existed it: concatenation, and bitwise or. Especially since bitwise or is often used (especially by people who’d rather by writing C than Python) as set union on integers used as bit sets. Set union also needs an intersection method (I’m oversimplifying by leaving our symmetric and asymmetric difference, but I don’t think that affects anything). And bitwise and is really the only good analogy there (there is no list operation remotely like intersection). And that pretty much settles it for union: nobody’s going to want + and &, so it has to be | and &. Notice that there was no question of adding a whole new operator for set union, just a question of deciding, of two existing operators, both of which were good fits, which was a better fit. Dict merge is in the same situation. It has analogies to concatenation and to set union. Some people also want to add intersection as well, in which case that would pretty much settle it: merge is union, and it has to be spelled |. But, unlike with sets, it isn’t at all obvious that we need intersection (and difference). I think that question is the main stumbling block to whether + or | is better. But either of those makes sense, and it’s just down to which of those two is better; there’s no reason to spell it as >> or ^ or (), or to add a whole new operator and protocol to Python (or a facility for arbitrary in-language-defined operators).
On 2019-10-18 17:31, Andrew Barnert via Python-ideas wrote:
On Oct 18, 2019, at 06:31, Richard Musil <risa2000x@gmail.com> wrote:
Technically we can call set union also an "add" operation, yet we use different semantics to express it, for exactly the same reason, it is not "arithmetic" nor, "concatenation", but something different.
Set union had analogies to two different operations that pre-existed it: concatenation, and bitwise or. Especially since bitwise or is often used (especially by people who’d rather by writing C than Python) as set union on integers used as bit sets.
Set union also needs an intersection method (I’m oversimplifying by leaving our symmetric and asymmetric difference, but I don’t think that affects anything). And bitwise and is really the only good analogy there (there is no list operation remotely like intersection). And that pretty much settles it for union: nobody’s going to want + and &, so it has to be | and &.
Notice that there was no question of adding a whole new operator for set union, just a question of deciding, of two existing operators, both of which were good fits, which was a better fit.
Dict merge is in the same situation. It has analogies to concatenation and to set union. Some people also want to add intersection as well, in which case that would pretty much settle it: merge is union, and it has to be spelled |. But, unlike with sets, it isn’t at all obvious that we need intersection (and difference). I think that question is the main stumbling block to whether + or | is better. But either of those makes sense, and it’s just down to which of those two is better; there’s no reason to spell it as >> or ^ or (), or to add a whole new operator and protocol to Python (or a facility for arbitrary in-language-defined operators).
It's interesting to note:
s = {False} s |= {0} s {False}
so would it be intuitive that when a dict has a matching key its value is updated? As a side note:
{False} & {0} {0}
On Fri, Oct 18, 2019 at 02:39:41PM +0200, Richard Musil wrote:
``` a = dict(...) b = dict(...) c = a |< b ```
2) Anyone finding this in the code, who does not now about the operator, will _not_ guess the wrong operation.
That's certainly true, but only because they will have absolutely no clue at all about what |< could possibly mean. "Bitwise-or less-than" perhaps, whatever that could mean. You seem to have come up with a completely unique symbol which has, as far as I can tell, never been used before. As far as I can see, neither APL nor Perl use it as an operator. I can't see it in Unicode, or as an emoticon. Mathematicians are always coming up with new and obscure operators, so it's possible that somebody has used this before. (To me, it most looks like a variety of turnstile symbol |- or ⊢ used in logic.) If anyone wants to trawl through the 300+ pages and 14,000+ symbols in the comprehensive list of Latex symbols: http://mirror.aarnet.edu.au/pub/CTAN/info/symbols/comprehensive/symbols-a4.p... please be my guest. It could be interpreted as an attempt to render the "not less-than" mathematical symbol ≮ U+226E in pure ASCII Variants of that glyph sometimes use a vertical, rather than slanted, bar.
I know this has been somewhat addressed in PEP, but I simply cannot see, how adding another ambiguity to the '+' symbol, can be better, because the "familiarity" of '+' (which seems to me being an argument in the PEP) just hides the fundamental differences between this '+' and the other ones (arithmetic, or concatenation).
Perhaps it does. But inventing obscure and cryptic symbols like |< simply hides the fundamental similarities between adding two dictionaries and other forms of addition. Numerous people have independentally come up with the idea of using the plus symbol for adding two dicts. To many people, the use of + for this purpose is obvious and self-explanatory, and it is a reoccurring source of puzzlement why Python doesn't support dict addition. https://stackoverflow.com/questions/6005066/adding-dictionaries-together-pyt... But until now, I'm pretty sure that *nobody* before you thought of using the |< operator for dict addition. Congratulations :-) If this PEP accomplishes nothing else, at least it will be a single source of information about dict addition the next hundred times somebody asks "Why can't I add two dicts?" *wink* -- Steven
i'm a -0 on the PEP. that being said, I am sure i will probably use the hell out of it. it is very convenient syntax for mashing together several mappings: # yuck for d in dict_seq: my_dict.update(d) # not much better {**d1, **d2, **d3} but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either: 1. create a new mapping of the type `type(mapping1)` with the conflicting values from dict1 having precedence 2. create a new mapping of the type `type(mapping1)` with the conflicting values from dict2 having precedence 3. create a new `dict` with the values from dict1 having precedence 4. create a new `dict` with the values from dict2 having precedence similarly, it is not obvious what += does: 1. add the contents of mapping2 to mapping1, but with existing values from mapping1 taking precedence 2. update mapping1 with the contents of mapping2, including updating of values from mapping2 there is no question this will be very challenging for many new coders. NONE of the above choices are intuitive, even after it is explained to you. and if you haven't done it in a while, you will have to go and look it up. it will be a real head scratcher for a LOT of people, and a big source of bugs. on the contrary: it becomes very intuitive to a new user why + for mappings does NOT exist once it is pointed out that there is the issue of reconciling key-value-pairs conflicts to consider. this produces an "ah ha" moment, leading to a deeper understanding of how mappings are so different from lists/tuples. list, tuple, and string addition, on the other hand, are inherently intuitive. once you learn these, you pretty much never look up what they mean again (except when you try to add a list to a tuple and get an error, or try += with a tuple or string). Rick.
On Fri, Oct 18, 2019 at 01:23:42PM -0400, Ricky Teachey wrote:
i'm a -0 on the PEP.
that being said, I am sure i will probably use the hell out of it. it is very convenient syntax for mashing together several mappings:
"I really like this idea, it's great! Let's not do it!" That's why I love Python-Ideas so much. *wink*
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
When Python first added the dict.update method (I think that was in 1.5?) I could never remember if it kept the old values or the new. But there were a million other things I couldn't remember either, such as whether slicing was open or closed, whether the trig functions in math took degrees or radians, whether exponentiation was ^ or ** and whether I needed parentheses around calls to print or not. I still sometimes write "import object from module" instead of the other way around, and I can never remember the names of the functions to convert degrees to and from radians. Until you learn the underlying principles of the language, very little is obvious. (I pity those learning languages with few underlying principles, or many exceptions.) One of those principles is that dicts are "last value seen wins". That applies pretty much everywhere. All of these will behave the same way: d = {'age': 10} d['age'] = 11; d['age'] = 12 d = {'age': 10, 'age': 11, 'age': 12} d = {'age': 10} d.update({'age': 11}, age=12} d = {**{'age': 10}, **{'age': 11}, **{'age': 12}} Rather than learning a whole lot of special cases, there's one general case that you need to learn: updates go from left to right (or top to bottom) and all of those cases become clear. And addition will be no different: d = {'age': 10} + {'age': 11} + {'age': 12} Outside of specialist uses (in which case, you can make a subclass) the most obvious and useful result is for the last value seen to win. That's what it means to *update* a dict. It would be a pretty strange language indeed if the default behaviour was for the update method to *not* update existing values: # this would truly be weird d = {'age': 10} d.update({'age': 11}) d.update({'age': 12}) assert d['age'] == 10 # first seen wins If you learn that + means update, then everything follows from that. As for the argument that people won't remember this, I find that a weak argument. Like *literally every single feature in Python*, those who use it will remember it. Those who don't won't need to remember it. And those in the awkward middle area of using it once in a while but not enough to remember it have my sympathy but nothing else. That's why we have docs, the `help` builtin, and a rich collection of forums where people can ask "What's foo?" and people can respond "LMGTFY". We should not fear having to look things up. Nobody can remember *everything*, and the only languages where you don't have to look things up are languages like Iota and Jot which have exactly two symbols and two operations each: https://en.wikipedia.org/wiki/Iota_and_Jot but just try programming in them. "Looking things up" is an important and fundamental skill for programmers, very possibly the single most important skill of all. -- Steven
Steven D'Aprano writes:
If you learn that + means update, then everything follows from that.
"+" doesn't mean "update", though. It means "accumulate". Consider sets. Except in situations where we have a very limited vocabulary for operators, such as ASCII-origin programming languages, we avoid using "+" to denote set union. In fact, in set theory, we go through a whole bunch of rigamarole to *define* numerical "+" as the *disjoint* union of finite numbers, each a canonical set of sets. Or consider analysis. Addition of functions (complex-valued or real-valued) is defined pointwise, as the sum of the values of the functions at each point. In general, vectors, matrices, and so on. I'm not against having an operator for updating dicts, but "+" is not it. "|" is fine, though. I'm unlikely to use the operator myself. In my experience I almost always want to do other stuff (type- or range-checking, mostly, sometimes collecting "profiling" statistics) in the method, and I prefer explict method invocation for such side effects. But I don't want to be reading other people's code and needing to remember that "+" is a pointwise replacement, not accumulation. Steve
On Oct 20, 2019, at 21:10, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
I'm not against having an operator for updating dicts, but "+" is not it. "|" is fine, though.
It seems like people who don’t really like this feature and don’t plan to use it mostly really want it to be spelled | if it has to be added. But people who are dying for it mostly want + (except for the ones who want all the set operators). I’m not sure what that means…
On 10/20/2019 10:08 PM, Andrew Barnert via Python-ideas wrote:
On Oct 20, 2019, at 21:10, Stephen J. Turnbull wrote:
I'm not against having an operator for updating dicts, but "+" is not it. "|" is fine, though.
It seems like people who don’t really like this feature and don’t plan to use it mostly really want it to be spelled | if it has to be added. But people who are dying for it mostly want + (except for the ones who want all the set operators). I’m not sure what that means…
I really don't want it as '+'. I'm happy to use it as '|'. :) -- ~Ethan~
I would use it, and I prefer | for the reasons given by Stephen, MRAB, and the other proponents. On Monday, October 21, 2019 at 1:08:32 AM UTC-4, Andrew Barnert via Python-ideas wrote:
On Oct 20, 2019, at 21:10, Stephen J. Turnbull < turnbull...@u.tsukuba.ac.jp <javascript:>> wrote:
I'm not against having an operator for updating dicts, but "+" is not it. "|" is fine, though.
It seems like people who don’t really like this feature and don’t plan to use it mostly really want it to be spelled | if it has to be added. But people who are dying for it mostly want + (except for the ones who want all the set operators). I’m not sure what that means…
_______________________________________________ Python-ideas mailing list -- python...@python.org <javascript:> To unsubscribe send an email to python-id...@python.org <javascript:> https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TV73IE... Code of Conduct: http://python.org/psf/codeofconduct/
**Strongly** disagree. I would anticipate using this feature a LOT, and would be excited to see it added. (I would love to replace things like "d2 = d1.copy(); d2.update(d3)" with just "d2 = d1 | d3". In-place "d2 |= d3" is nice in its terseness, but isn't a huge benefit.) But, I completely agree with the arguments in favor of using "|" from the semantic perspective of the operation being much more like a set operation, than a list operation. Further: One angle I don't think I've read from anyone yet (still catching up on the thread tho) is the question of the obscurity of "|" vs the commonality of "+", and how they would likely interact with newcomers. A newcomer is likely going to use "+" a LOT, in all sorts of different situations. Given how non-intuitive dict merging can be, I would rather 'd1 + d2' throw a TypeError than return something unexpected. The more-obscure 'd1 | d2' is likely only going to be used by people who know how the machinery works and go searching for a more concise idiom, and who thus are less likely to be surprised by how it works. -Brian
The fact ``dict`` is a mutable object makes this PEP very complicated. Let's say we have this example: x = {'a': 1, 'b': 2, 'c': {'c': 3}} y = {'d': 4, 'c': {'c': 5}} If we were to merge the two dicts together, such as: x.update(y) Then we expect ``y`` to have been updated and look like: {'a': 1, 'b': 2, 'c': {'c': 5}, 'd': 4} ``x['c']`` is now the same object of ``y['c']`` and any updates to either is reflected on the other. Now if we have ``z = x + y``. Should ``z`` be a new shallow copy? Deep copy? Or completely new dict with no references to neither ``x`` or ``y``? If we limit the merge operator to ``inplace`` update, then we could have two options, ``update`` and ``deepupdate``. ``x < y`` => x.update(y) ``x > y`` => y.update(x) ``y << x`` => deepupdate(x, y) ``x >> y`` => deepupdate(y, x) Alex Martelli has an implementation of ``deepupdate`` on https://stackoverflow.com/a/3233356/360362 I think supporting ``z = x op y`` on dicts without thinking carefully about mutability and references will just add a new section to the python gotchas. Meitham On 10/21, brian.skinn@gmail.com wrote:
**Strongly** disagree. I would anticipate using this feature a LOT, and would be excited to see it added. (I would love to replace things like "d2 = d1.copy(); d2.update(d3)" with just "d2 = d1 | d3". In-place "d2 |= d3" is nice in its terseness, but isn't a huge benefit.) But, I completely agree with the arguments in favor of using "|" from the semantic perspective of the operation being much more like a set operation, than a list operation.
Further: One angle I don't think I've read from anyone yet (still catching up on the thread tho) is the question of the obscurity of "|" vs the commonality of "+", and how they would likely interact with newcomers. A newcomer is likely going to use "+" a LOT, in all sorts of different situations. Given how non-intuitive dict merging can be, I would rather 'd1 + d2' throw a TypeError than return something unexpected. The more-obscure 'd1 | d2' is likely only going to be used by people who know how the machinery works and go searching for a more concise idiom, and who thus are less likely to be surprised by how it works.
-Brian _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JXY3VH... Code of Conduct: http://python.org/psf/codeofconduct/
-- Meitham Jamaa http://meitham.com GPG Fingerprint: 8C8E3FC7
Meitham Jamaa wrote:
The fact dict is a mutable object makes this PEP very complicated. Let's say we have this example: x = {'a': 1, 'b': 2, 'c': {'c': 3}} y = {'d': 4, 'c': {'c': 5}} If we were to merge the two dicts together, such as: x.update(y) Then we expect y to have been updated and look like: {'a': 1, 'b': 2, 'c': {'c': 5}, 'd': 4} x['c'] is now the same object of y['c'] and any updates to either is reflected on the other.
I really don't see how this is any different than list concatenation with +/+=... x = [1, 2] y = [{'c': 3}] x += y x[-1] and y[-1] now refer to the same object, and any updates in one will be reflected in the other. It would be surprising if they didn't!
On Mon, Oct 21, 2019 at 4:49 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
Meitham Jamaa wrote:
The fact dict is a mutable object makes this PEP very complicated.
no, it doesn't -- a mutable inside a container of any sort has the same issues: Yes, that does cause it's confusion, but this proposal doesn't change that. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Oct 20, 2019, at 18:26, Steven D'Aprano <steve@pearwood.info> wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
When Python first added the dict.update method (I think that was in 1.5?) I could never remember if it kept the old values or the new. But there were a million other things I couldn't remember either, such as whether slicing was open or closed, whether the trig functions in math took degrees or radians, whether exponentiation was ^ or ** and whether I needed parentheses around calls to print or not.
Agreed. I know that in most of the languages I occasionally use, update/merge/+/|/<</addEntriesFromDictionary:/whatever preserves the right value, but at least one does the other. So whenever I go back to one of those languages after a while away from it, I have to look that up again. That’s unavoidable. Just as I have to look up how they spell it in the first place. And it’s already just as true for Python as it is for Ruby and ObjC and so on, because we already have update; people already have to learn that it uses the right value, and that it’s spelled update rather than merge! or something else. If we have + or |, people will have to learn that it uses the right value there too. The fact that all of the operations are consistent will mean that eventually they only have one thing to remember instead of a bunch of separate things, but it’s still a thing to remember. The only way around this is to provide a whole suite of operators for every variation (like Haskell), or to pass a function to make the choice explicitly (like Ruby). And even then, it doesn’t actually makes things easier to discover, or to remember, just more flexible. I know that I can both update preserving left and updating preserving right in Haskell, as well as a bunch of other variations, and the rules are nicely mnemonic while you’re using Haskell regularly, but if I haven’t used Haskell in a few months I still need to look up those rules and relearn them.
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
When Python first added the dict.update method (I think that was in 1.5?) I could never remember if it kept the old values or the new. But there were a million other things I couldn't remember either, such as whether slicing was open or closed, whether the trig functions in math took degrees or radians, whether exponentiation was ^ or ** and whether I needed parentheses around calls to print or not.
Agreed.... we already have update; people already have to learn that it uses the right value, and that it’s spelled update rather than merge! or something else. If we have + or |, people will have to learn that it uses the right value there too. The fact that all of the operations are consistent will mean that eventually they only have one thing to remember instead of a bunch of separate things, but it’s still a thing to remember.
Well, the part about looking it up wasn't so much the actual objection as it being so much less intuitive- for newcomers- what exactly the operation would do compared to the addition of the other basic structures: strings and lists. Those are so obvious they just make sense. Having to look it up a lot is just a potential symptom of a larger problem. Granted, I'm coming at this from an assumption that there should be a higher bar for operators as compared to methods: imo, they should be totally intuitive in the way they behave. In fact this consistency about operators is one of the things I love about python. I don't think that standard is ~quite~ met here. So I'm still -0.. I think the most intuitive situation is the status quo: "no dict addition because what do you do with the values?" It's certainly possible I'm setting the bar too high, though, or that I'm so familiar with the way operators for other structures behave now that they seem intuitive to me when really they are not. The .update() method, on the other hand, feels much more naturally intuitive once you are aware it exists. And I would be very much in favor of a method (a new one, or perhaps adding arguments to .copy()) that copies and updates a dictionary in a single step.
On 2019-10-18 10:23, Ricky Teachey wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
The big trade off I'm gathering from this mega-thread is that the |, |= operators are more accurate, but less obvious to newcomers, who will first try +, += instead. I've tried them in this order myself several times over the years. Had an idea, why not choose the more accurate syntax: |, |= after all? Then, to help newcomers and forgetful pros a custom error message is implemented for +, +=. In pseudo C/Python, something like this: class dict: def __add__(self, other): if isinstance(other, dict): raise TypeError( 'unsupported operand type(s) for +: … ' 'Dictionary merging leads to last-value-wins data ' 'loss. If acceptable, use the union "|" operator.' ) else: raise TypeError(std_err_msg) I think it is worth it to lead the newcomer to a moment's reflection on why dictionary combining/merging is potentially lossy. Everyone is informed with the proper mental model, then on their way and left alone afterward. Thoughts? -Mike
I really like this idea. Once you've already decided to raise an exception, does it really cost much to try to raise a more helpful one? And helpful exception messages make programming a lot less painful, and a lot more of a joy. On Tue, Oct 22, 2019 at 2:43 PM Mike Miller <python-ideas@mgmiller.net> wrote:
On 2019-10-18 10:23, Ricky Teachey wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
The big trade off I'm gathering from this mega-thread is that the |, |= operators are more accurate, but less obvious to newcomers, who will first try +, += instead.
I've tried them in this order myself several times over the years.
Had an idea, why not choose the more accurate syntax: |, |= after all? Then, to help newcomers and forgetful pros a custom error message is implemented for +, +=. In pseudo C/Python, something like this:
class dict:
def __add__(self, other):
if isinstance(other, dict): raise TypeError( 'unsupported operand type(s) for +: … ' 'Dictionary merging leads to last-value-wins data ' 'loss. If acceptable, use the union "|" operator.' ) else: raise TypeError(std_err_msg)
I think it is worth it to lead the newcomer to a moment's reflection on why dictionary combining/merging is potentially lossy. Everyone is informed with the proper mental model, then on their way and left alone afterward.
Thoughts? -Mike _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/43T52R... Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/puGRBmzVl9c/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/python-ideas/3ac803c4-f703-67bf-c4c9-a37bb... .
On Oct 22, 2019, at 11:39, Mike Miller <python-ideas@mgmiller.net> wrote:
Had an idea, why not choose the more accurate syntax: |, |= after all? Then, to help newcomers and forgetful pros a custom error message is implemented for +, +=. In pseudo C/Python, something like this:
class dict:
def __add__(self, other):
if isinstance(other, dict): raise TypeError( 'unsupported operand type(s) for +: … ' 'Dictionary merging leads to last-value-wins data ' 'loss. If acceptable, use the union "|" operator.' ) else: raise TypeError(std_err_msg)
This seems nifty—but will it break the __radd__ protocol? In other words: class FancyDict(dict): def __add__(self, other): # handles other being a plain dict just fine def __radd__(self, other): # handles other being a plain dict just fine … you want to make sure that adding a dict (or other dict subclass) and a FancyDict in either order calls the FancyDict method. Off the top of my head, I think it’s safe—and if not it would be safe to move your logic to dict.__radd__ and have __add__ either not there or return NotImplemented, because there’s a rule that if one object is an instance of a subclass of the other object’s type it gets first dibs. But someone needs to read the data model docs carefully—and also check what happens if both types are C extensions (since dict is). Anyway, while I don’t know if there is precedent for anything like this in a builtin type’s methods, there is precedent in builtin functions, like sum, so I think if it’s doable it might be acceptable. The only question is whether you’d want the same error for adding instances of subclasses of dict that don’t override the method(s)—and I think the answer there is yes, you would.
I'm not crazy about advertising APIs this way ("did you mean ..."), and even if we would eventually decide to do this, I'm not sure that dict+dict is the place to start. (Okay, we already started, with "print x" saying "Did you mean print(x)?" -- but that shows how rare this should be IMO.) Anyway, __add__ should return NotImplemented in the else branch, to give C.__radd__ a chance in the case dict()+C() where C does not subclass dict. I *think* it should then be safe but there are some traps here (e.g. __radd__ sometimes gets called before __add__, if the right operand's class is a subclass of the left operand's class). On Tue, Oct 22, 2019 at 3:18 PM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
On Oct 22, 2019, at 11:39, Mike Miller <python-ideas@mgmiller.net> wrote:
Had an idea, why not choose the more accurate syntax: |, |= after all?
Then, to help newcomers and forgetful pros a custom error message is implemented for +, +=. In pseudo C/Python, something like this:
class dict:
def __add__(self, other):
if isinstance(other, dict): raise TypeError( 'unsupported operand type(s) for +: … ' 'Dictionary merging leads to last-value-wins data ' 'loss. If acceptable, use the union "|" operator.' ) else: raise TypeError(std_err_msg)
This seems nifty—but will it break the __radd__ protocol? In other words:
class FancyDict(dict): def __add__(self, other): # handles other being a plain dict just fine def __radd__(self, other): # handles other being a plain dict just fine
… you want to make sure that adding a dict (or other dict subclass) and a FancyDict in either order calls the FancyDict method.
Off the top of my head, I think it’s safe—and if not it would be safe to move your logic to dict.__radd__ and have __add__ either not there or return NotImplemented, because there’s a rule that if one object is an instance of a subclass of the other object’s type it gets first dibs. But someone needs to read the data model docs carefully—and also check what happens if both types are C extensions (since dict is).
Anyway, while I don’t know if there is precedent for anything like this in a builtin type’s methods, there is precedent in builtin functions, like sum, so I think if it’s doable it might be acceptable. The only question is whether you’d want the same error for adding instances of subclasses of dict that don’t override the method(s)—and I think the answer there is yes, you would.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5N67XB... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Tue, Oct 22, 2019 at 11:39:59AM -0700, Mike Miller wrote:
On 2019-10-18 10:23, Ricky Teachey wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
The big trade off I'm gathering from this mega-thread is that the |, |= operators are more accurate, but less obvious to newcomers, who will first try +, += instead.
I'm surprised by that description. I don't think it is just newcomers who either suggest or prefer plus over pipe, and I don't think that pipe is "more accurate". As I pointed out in the PEP, plus often gets used for non-commutative operations (such as concatenation and ordinal arithmetic) but I am unaware of the union operator ∪ or | as spelled in Python ever being used for a non-commutative operation. Contrary to the views of many people upset that dict + would be non- commutative, it is arguably *more natural* to use plus for a non- commutative operation than it would be to use pipe. The biggest advantage of pipe is that it naturally lends itself to the rest of the set operations. But using pipe for a non-commutative operator is far less common than using plus.
I've tried them in this order myself several times over the years.
Had an idea, why not choose the more accurate syntax: |, |= after all? Then, to help newcomers and forgetful pros a custom error message is implemented for +, +=. In pseudo C/Python, something like this:
class dict:
def __add__(self, other):
if isinstance(other, dict): raise TypeError( 'unsupported operand type(s) for +: … ' 'Dictionary merging leads to last-value-wins data ' 'loss. If acceptable, use the union "|" operator.' )
I think that is patronising to anyone, newbies and experienced programmers alike, who know and expect that merging dicts with an operator will have the same semantics as merging them with the update method.
I think it is worth it to lead the newcomer to a moment's reflection on why dictionary combining/merging is potentially lossy.
Should we also force newcomers to give a moment's reflection on why item assignment ``mydict[key] = value`` is potentially "lossy"? How about ``mystring.replace(old, new)`` or opening a file for writing? I think that we should trust that when the programmer asks to update a dict with new values (either in-place or as a copy), it is because they don't want the old values any more. Even if they don't know what they are doing, it is not the place of the interpreter to treat them as an ignoramus that needs to be forced into reflecting on the consequences of their action, as if they were a naughty little schoolboy being told off by their headmaster. -- Steven
On Wed, Oct 23, 2019 at 7:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Oct 22, 2019 at 11:39:59AM -0700, Mike Miller wrote:
On 2019-10-18 10:23, Ricky Teachey wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either:
The big trade off I'm gathering from this mega-thread is that the |, |= operators are more accurate, but less obvious to newcomers, who will
first
try +, += instead.
I'm surprised by that description. I don't think it is just newcomers who either suggest or prefer plus over pipe, and I don't think that pipe is "more accurate".
Looking at: ``` a = dict() b = dict() c = a + b d = a | b ``` no one can tell what either '+' or '|' does in this context without guessing, because it does do neither "arithmetic addition", nor "concatenation", nor "union set" (or bitwise op, or any other op one could come up). '+' is more familiar or '|' might even be completely new for some users, but this distinction does not help making '+' in this context "more obvious". What I can only say for sure is that it is an operation on two dicts and the result is possibly again a dict. So consulting the doc will be needed in either case.
As I pointed out in the PEP, plus often gets used for non-commutative operations (such as concatenation and ordinal arithmetic) but I am unaware of the union operator ∪ or | as spelled in Python ever being used for a non-commutative operation.
The commutative property of the arithmetic addition is a distinct feature, but not a fundamental one. Both the arithmetic addition and the concatenation (which is not commutative) are different manifestations of the same "generic idea of addition" in two different contexts (systems). This is the reason while intuitively we do not object using '+' in both. Union set, while being commutative, is not a manifestation of the "addition idea", but of something else - a union set idea :), which is different from the addition idea, because it is not only more specialized, but, and this is the important difference, requires recognition of another concept - identity, which the addition does not need, and on the abstract (idea) level operates differently (vaguely saying, "addition" preserves the integrity, "union set" preserves the identity). Contrary to the views of many people upset that dict + would be non-
commutative, it is arguably *more natural* to use plus for a non- commutative operation than it would be to use pipe.
If the aim is to get an operator (a graphical symbol) which is the closest representation of the _idea_ then the commutativity is not a deciding factor (as I mentioned above), but only a distinctive attribute, which may differ in different manifestations of the same idea. The dict merge (or update) operation is unfortunately related neither to all "addition manifestations" nor to "union set" in a sense that it comes from the same idea, but it is an idea on its own, because apart from the recognition of identity (which union set requires), it also requires a concept of association (which neither addition nor union set use or need) and the actual operation is again different from the former two. The biggest advantage of pipe is that it naturally lends itself to the
rest of the set operations. But using pipe for a non-commutative operator is far less common than using plus.
Using either '|' or '+' is technically not correct, if we want to have it consistent on the abstract level. But '|' (the idea of what '|' is representing) is more similar to what is happening in dict merge than the addition idea. Since introducing a new (operator) symbol for that seems to be unacceptable (for practical reasons), '|' is the best candidate we have. Richard
no one can tell what either '+' or '|' does in this context without guessing, because it does do neither "arithmetic addition", nor "concatenation", nor "union set" (or bitwise op, or any other op one could come up). '+' is more familiar or '|' might even be completely new for some users, but this distinction does not help making '+' in this context "more obvious".
What I can only say for sure is that it is an operation on two dicts and the result is possibly again a dict. So consulting the doc will be needed in either case.
Union set, while being commutative, is not a manifestation of the "addition idea", but of something else - a union set idea :), which is different from the addition idea, because it is not only more specialized, but, and this is the important difference, requires recognition of another concept - identity, which the addition does not need, and on the abstract (idea) level operates differently (vaguely saying, "addition" preserves the integrity, "union set" preserves the identity).
The dict merge (or update) operation is unfortunately related neither to all "addition manifestations" nor to "union set" in a sense that it comes from the same idea, but it is an idea on its own, because apart from the recognition of identity (which union set requires), it also requires a concept of association (which neither addition nor union set use or need) and the actual operation is again different from the former two.
Using either '|' or '+' is technically not correct, if we want to have it
consistent on the abstract level. But '|' (the idea of what '|' is representing) is more similar to what is happening in dict merge than the addition idea. Since introducing a new (operator) symbol for that seems to be unacceptable (for practical reasons), '|' is the best candidate we have.
Thanks for the nice run-down on operators and logic, and I mostly agree. As you say, neither is accurate due to the different mathematical structure of dictionaries as compared to sets, etc. To me '+' would seem the most natural in the sense of that is what I would have tried first, as many others have stated. From the "feeling" of it, '+' points toward 'addition', 'what comes later'. I see it as rather dangerous to use '|' union because it behaves similar - as it is not the same - confusion may be larger. Yes, you have to look up in the doc what it does - or, for the lazy, do a hand full of tests with examples (I often do that). +1 for '+'
On 23/10/2019 05:41, Steven D'Aprano wrote:
On Tue, Oct 22, 2019 at 11:39:59AM -0700, Mike Miller wrote:
but i'm -0 because i am very concerned it will not be obvious to new learners, without constantly looking it up, whether adding two mappings together would either: The big trade off I'm gathering from this mega-thread is that the |, |= operators are more accurate, but less obvious to newcomers, who will first
On 2019-10-18 10:23, Ricky Teachey wrote: try +, += instead. I'm surprised by that description. I don't think it is just newcomers who either suggest or prefer plus over pipe, and I don't think that pipe is "more accurate".
+1 (as one of the non-newcomers who prefers plus) -- Rhodri James *-* Kynesim Ltd
On Wed, Oct 23, 2019 at 5:42 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
I'm surprised by that description. I don't think it is just newcomers who either suggest or prefer plus over pipe, and I don't think that pipe is "more accurate".
+1 (as one of the non-newcomers who prefers plus)
me too. frankly, the | is obscure to most of us. And it started as "bitwise or", and evokes the __or__ magic method -- so why are we all convinced that somehow it's inextricably linked to "set union"? And set union is a bit obscure as well -- I don't think that many people (newbies or not) would jump right to this logic: I need to put two dicts together That's kind of like a set union operation The set object uses | for union That's probably what dict uses -- I'll give that t try. Rather than: I need to put two dicts together I wonder if dicts support addition? I'll give that a try. so, I'm +1 on + +0 on | if | is implemented, I'll bet you dollars to donuts that it will get used far less than if + were used. (though some on this thread would think that's a good thing :-) ) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, Oct 23, 2019 at 1:19 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Oct 23, 2019 at 5:42 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
I'm surprised by that description. I don't think it is just newcomers who either suggest or prefer plus over pipe, and I don't think that pipe is "more accurate".
+1 (as one of the non-newcomers who prefers plus)
me too.
frankly, the | is obscure to most of us. And it started as "bitwise or", and evokes the __or__ magic method -- so why are we all convinced that somehow it's inextricably linked to "set union"? And set union is a bit obscure as well -- I don't think that many people (newbies or not) would jump right to this logic:
In my particular case I do know what `|` means in a set context. However, when I see code using it, it takes me a while to understand what it means and I tend to replace the operator with an explicit call to the union method. The only problem with that is when `dict_keys` are in use, as they do implement the `|` operator, but not the `union` method. They only have `isdisjoint`.
I need to put two dicts together That's kind of like a set union operation The set object uses | for union That's probably what dict uses -- I'll give that t try.
Rather than:
I need to put two dicts together I wonder if dicts support addition? I'll give that a try.
so, I'm
+1 on + +0 on |
if | is implemented, I'll bet you dollars to donuts that it will get used far less than if + were used.
(though some on this thread would think that's a good thing :-) )
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TE2IFV... Code of Conduct: http://python.org/psf/codeofconduct/
-- Sebastian Kreft
On Thu, Oct 24, 2019 at 1:20 AM Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Oct 23, 2019 at 5:42 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
frankly, the | is obscure to most of us. And it started as "bitwise or", and evokes the __or__ magic method -- so why are we all convinced that somehow it's inextricably linked to "set union"?
It is because "bitwise or" is very similar to "set union". You can regard integer as bitset (set of bits). 5 is {bit 1, bit 3} and 6 is {bit 2, bit 3}. So 5 | 4 is 7 {bit 1, bit 2, bit 3}. So reusing | to set union is very natural to me. But if we use + for dict merging, I think we should add + to set too. Then the set has `.union()`, `|` and `+` for the same behavior. Regards, -- Inada Naoki <songofacandy@gmail.com>
On Wed, Oct 23, 2019 at 11:47 PM Inada Naoki <songofacandy@gmail.com> wrote:
So reusing | to set union is very natural to me.
I understand that, and agree -- it made sense to me when set() was introduced. But that wasn't my point -- we are not deciding now what operator to use for set union, we are deciding what operator to use for dict merging. And I don't think that (most) people will naturally think about set union when they want to put two dicts together. logic theory aside, most people both learn dicts first, and use them a lot more than sets. And I don't think it's about uptake either -- I think it will be a long-standing difference. That being said, this is just a new way to spell what can already be done, so not a big deal either way.
But if we use + for dict merging, I think we should add + to set too. Then the set has `.union()`, `|` and `+` for the same behavior.
I'd be fine with that :-) -- I suspect that if operators on dicts and sets were considered at the same time, + may well have been chosen. And if dicts supported + when sets were introduced, it probably would have been used there. I'm going to try to avoid any more comments on this -- it's all been said, many, many times -- I think it's time for a decision, and we can all go bike-shed other things :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Oct 23, 2019, at 23:47, Inada Naoki <songofacandy@gmail.com> wrote:
But if we use + for dict merging, I think we should add + to set too. Then the set has `.union()`, `|` and `+` for the same behavior.
I don’t think we really need that. If set and dict have only a tiny bit of overlap in their API today (beyond both being sized iterable containers), as is the case today, then the fact that dict spells an operation + and set spells a similar but not identical operation | doesn’t seem like it’s going to be a serious learning hurdle. On the other hand, I don’t think it would be terrible to add it either. It’s no worse than C++ and a bunch of other languages having separate merge and union operations for the benefit of multisets, even though they do the exact same thing on sets, and nobody gets confused by that, even people who’ve never seen a multiset. (Sure, people ask why they both exist on StackOverflow, but nobody asks how to understand some code because ir used one instead of the other.) Which means we could add + to dict today, and if we later decide to add all the set operators to dict, so we have + and | do the same thing on dict, and then we also add + to set (which will be a lot more compelling than it is now)… no big deal. So, I no longer have any objection to + on dict based on the fact that we’re apparently considering also considering adding set operations later. I still like | more than +, but I no longer have a good argument for it, and I could live with either.
On Oct 23, 2019, at 23:47, Inada Naoki <songofacandy@gmail.com> wrote:
But if we use + for dict merging, I think we should add + to set too. Then the set has `.union()`, `|` and `+` for the same behavior. I don’t think we really need that. If set and dict have only a tiny bit of overlap in their API today (beyond both being sized iterable containers), as is the case today, then the fact that dict spells an operation + and set spells a similar but not identical operation | doesn’t seem like it’s going to be a serious learning hurdle.
On the other hand, I don’t think it would be terrible to add it either. It’s no worse than C++ and a bunch of other languages having separate merge and union operations for the benefit of multisets, even though they do the exact same thing on sets, and nobody gets confused by that, even people who’ve never seen a multiset. (Sure, people ask why they both exist on StackOverflow, but nobody asks how to understand some code because ir used one instead of the other.)
Which means we could add + to dict today, and if we later decide to add all the set operators to dict, so we have + and | do the same thing on dict, and then we also add + to set (which will be a lot more compelling than it is now)… no big deal. So, I no longer have any objection to + on dict based on the fact that we’re apparently considering also considering adding set operations later.
I still like | more than +, but I no longer have a good argument for it, and I could live with either. My 2¢: Initially I was in favour of the PEP ... (Disclosure: I tend to like shiny new features) ... and was in favour of using +. But I have been swayed by the arguments for |. One which hasn't AFAIK been mentioned is that d1 | d1 == d1
On 24/10/2019 18:19:27, Andrew Barnert via Python-ideas wrote: paralleling other uses of |. However: I have often wanted to join a few strings together: s = s1 + s2 + s3 + s4 + s5 # assume performance not important and appreciate the conciseness, but I cannot remember ever wanting to merge more than two dictionaries together. No doubt such applications exist, but I doubt that they're common. So IMO the advantage of the PEP is marginal. 0 (neither + nor -) on the PEP (with either + or |). Rob Cliffe
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WKN66H... Code of Conduct: http://python.org/psf/codeofconduct/
Christopher Barker wrote:
if | is implemented, I'll bet you dollars to donuts that it will get used far less than if + were used. (though some on this thread would think that's a good thing :-) ) -CHB
(Non-beginner, |-preferrer) I think Christopher is correct here, but only to a point. I think he's right that the *initial* adoption of the syntax would be slower for |/|= as compared to +/+=. Thinking about what I understand as the four major "wings" of Python's application space (yah, I'm probably oversimplifying, don't @ me), though, I figure uptake of the syntax in two of the three wings either (a) will be slow *anyways* or (b) *ought* to be slow, to avoid messing people up. Libraries: If a final PEP is accepted, the syntax will become available in 3.9 or 3.10 or whatever.... but, most libraries won't use the syntax until they drop support for the Python versions without it. So, uptake will inevitably be ~glacial here. Lots of opportunities for people to start seeing the syntax, do a double-take, and if interested hop over to the docs and learn. Really, the obscurity of | would actually encourage docs-checking, I would think. Tools (CLI etc.): Basically the same as for libraries, except probably even slower -- tools have to be *very* conservative in their assumptions about the version of Python that will be available in an arbitrary installation environment. Data Science: In this space, I feel like the more-obscure nature of | is of particular benefit. My understanding is that many people doing work here overall tend to have less training in the art of programming itself (whether schooling, direct coding expertise, or whatever), and thus might be more likely to naively use '+' and get themselves into trouble. I feel like the unfamiliarity of | will make them be less instinctively confident that they know what the operation will do, and more likely to look carefully at it as they start trying to use it. Web: This is the space where I think Christopher's concerns about the downside of slow uptake will be most relevant, as to my understanding most web devs probably have enough experience with dicts to know that "d1 + d2" is not a trivial operation, and they have to make sure they know what it's actually doing under the hood. Slower uptake with | would thus be a downside... but it's not like this is a major optimization or anything. So: In most cases, uptake will be slow anyways. I'd rather have it be a bit slower, and use an operator that will make people think more carefully about what it does while they're learning to use it. (As I said before, I also agree with the semantic arguments for | over +.)
On Thu, Oct 24, 2019 at 10:01:28AM -0000, brian.skinn@gmail.com wrote:
So: In most cases, uptake will be slow anyways. I'd rather have it be a bit slower, and use an operator that will make people think more carefully about what it does while they're learning to use it.
o_O We're talking about an operator to copy-and-update a dict, not invade Iraq. -- Steven
Steven D'Aprano wrote:
We're talking about an operator to copy-and-update a dict, not invade Iraq.
[grin] Fair enough. In the interim, actually, I realized some semantic downsides of using pipe. If a user happens to draw an analogy to logical 'or', the 'which one wins' semantics is different. result = 0 or 1 or 2 # result == 1 result = {'foo': 0} | {'foo': 1} | {'foo': 2} # result == {'foo': 2} Also, the logic flow of pipe from a shell script context is left-to-right, in contrast to this dict operation: cmd1 | cmd2 # Information flows left-to-right d1 | d2 # To know which values are in the result for a given key, read right-to-left I withdraw my preference for pipe. I really like the feature, and will gladly use it with '+'. +1.
Other folks (and I earlier) have explained why we think | is the better choice, if less obvious. On 2019-10-22 21:41, Steven D'Aprano wrote:
I think that is patronising to anyone, newbies and experienced programmers alike, who know and expect that merging dicts with an operator will have the same semantics as merging them with the update method.
Should we also force newcomers to give a moment's reflection on why item assignment ``mydict[key] = value`` is potentially "lossy"? How about ``mystring.replace(old, new)`` or opening a file for writing?
Even if they don't know what they are doing, it is not the place of the interpreter to treat them as an ignoramus that needs to be forced into reflecting on the consequences of their action, as if they were a naughty little schoolboy being told off by their headmaster.
This is an odd take, that a helpful error message is "patronising" and treats you like an ignoramus. The alternative, losing data is expected, builds character / "puts hair on your chest." This attitude reminds me of the bad old days on Slashdot. I had thought it had gone out of fashion after the tech community grew after the turn of the century. I'll take all the helpful error messages I can get personally, assuming they make sense. -Mike
On Oct 18, 2019, at 09:33, Steven D'Aprano <steve@pearwood.info> wrote:
That's certainly true, but only because they will have absolutely no clue at all about what |< could possibly mean. "Bitwise-or less-than" perhaps, whatever that could mean.
You seem to have come up with a completely unique symbol which has, as far as I can tell, never been used before. As far as I can see, neither APL nor Perl use it as an operator. I can't see it in Unicode, or as an emoticon.
It has multiple meanings in Haskell, and at least one is reasonably common. I think the main one is for mapping insertion, like this: newdict = (key, value) |< olddict There’s also >| to reverse the order of the operands, and also |> and <| to keep the old value rather than the new if the key already exists. And you can lift any of these four to do a dict merge. But it’s also, e.g., the rightwise asymmetric version of >|<, meaning that if you give it two Either values, it’ll return the right one if it succeeded, otherwise be left one if it succeeded, otherwise the right error. See Hoogle (https://hoogle.haskell.org/?hoogle=%7C%3C&scope=set%3Astackage) for all the meanings in the stdlib, but programs and third-party libs might define even more. But I think this all serves as more an argument for you than for the other side. :)
On Fri, Oct 18, 2019 at 6:39 PM Steven D'Aprano <steve@pearwood.info> wrote:
You seem to have come up with a completely unique symbol which has, as far as I can tell, never been used before. As far as I can see, neither APL nor Perl use it as an operator. I can't see it in Unicode, or as an emoticon.
What is funny, it was not at all my original goal, I just went ahead about what would visually most closely express the operation behind. The fact that it turned out to be an unexpected symbol was just incidental. But you can also take it as a measure how far/close it is to "add", or "concatenation" (at least for me). It could be interpreted as an attempt to render the "not less-than"
mathematical symbol ≮ U+226E in pure ASCII Variants of that glyph sometimes use a vertical, rather than slanted, bar.
Take it as placeholder, I am not going to argue about all the different implications it could have, nor insisting on it being the right symbol for the task.
Perhaps it does. But inventing obscure and cryptic symbols like |< simply hides the fundamental similarities between adding two dictionaries and other forms of addition.
Dictionary "merge" is as close to addition or concatenation as union set is, which means not much really (otherwise we would intuitively use '+' also for union). We could as well use '+' for writing into the file (because for the file object it "adds" something to it), but we don't. There is a difference between what the object can take and what is already established in the language. There has been other remarks on this thread that '+' symbol "does not feel exactly right" and this is the reason. The PEP actually never adresses it directly too, it just goes around about how '+' symbol is used for different operations in different contexts (which should technically prove that there is no "intuitive" interpretation, because it could do anything) and that considering this, the proposed use should be alright too. Richard
On 10/16/2019 10:35 PM, Brandt Bucher wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition):
Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review.
My apologies if this feedback is not new; I did not follow the first round. Looking at the codebase I support (OpenERP 7.0), I see extensive use of the update() method. I went through and replaced a few of them with += to try and get a feel for it. + and += are a cognitive mismatch for update(). The problem is that dicts are complex objects with two pieces of information, and the results of + and += don't match what happens when two dicts are combined. For an easy demonstration of this: --> a = {'some_key': 'some value} --> a += {'some_key': 'another value'} --> a {'some_key': 'another value'} That result just doesn't match up with the '+' operator. For what it's worth, I don't think the '|' operator works any better. Reviewing my code base, I do see a benefit in replacing the updates() with an operator, but I'm not aware of any operator that captures the "update" results. -1 on the PEP. -- ~Ethan~
On Fri, Oct 18, 2019 at 09:17:54AM -0700, Ethan Furman wrote:
That result just doesn't match up with the '+' operator.
Why not? Before answering, please check the PEP to see if your objection has already been raised. Addition and the plus operator is flexible enough to be used for everything from non-commutative summation, clock arithmetic (9 + 4 = 1), disjoint unions, both boolean OR and XOR, concatenation, conjunction, at least one folk duo https://en.wikipedia.org/wiki/You%2BMe and figurative uses like this: https://www.msn.com/en-my/news/national/race-plus-religion-plus-recession-eq... The second definition of "add" in Webster's dictionary is "to join or unite", which matches dict merging very well. WordNet says "join or combine or unite with others" and doesn't get to numeric addition until the fourth definition. It is perfectly natural and obvious for English speakers to think of addition as more than just the narrow definition of numeric addition. This is why people keep independently coming up with the same idea of adding dicts. If you're going to deny that standard, common understanding of "plus" as combining, and claim that it "just doesn't match up" in defiance of common practice, you ought to have a good reason why all those who refer to dict addition or adding dicts are wrong to do so. -- Steven
On 10/18/2019 10:25 AM, Steven D'Aprano wrote:
On Fri, Oct 18, 2019 at 09:17:54AM -0700, Ethan Furman wrote:
That result just doesn't match up with the '+' operator.
Why not?
Pretty sure I answered that question in my OP. Here it is again since you conveniently stripped it out:
The problem is that dicts are complex objects with two pieces of information, and the results of + and += don't match what happens when two dicts are combined. For an easy demonstration of this:
--> a = {'some_key': 'some value}
--> a += {'some_key': 'another value'}
--> a {'some_key': 'another value'}
That result just doesn't match up with the '+' operator. For what it's worth, I don't think the '|' operator works any better.
Before answering, please check the PEP to see if your objection has already been raised.
What do you know, it had been! PEP 584: -------
Dict addition is lossy Dict addition can lose data (values may disappear); no other form of addition is lossy.
Response:
It isn't clear why the first part of this argument is a problem. dict.update() may throw away values, but not keys; that is expected behavior, and will remain expected behavior regardless of whether it is spelled as update() or +.
It's a problem because the method is named "update", not "add", "join", or "combine". The dict "+" operator is being turned into a deduplication operator - as if, for example: --> [1, 2, 3] + [3, 4, 5] [1, 2, 3, 4, 5] # not what happens
The second definition of "add" in Webster's dictionary is "to join or unite", which matches dict merging very well.
No, it doesn't -- not in the case of duplicate keys. If we had two people named "Steve" and joined them into a group, would you expect one of them to just disappear? Even better, if we had two engineers (key) named Anita and Carolyn (values) and combined them into a group, do you expect one of them to vanish?
It is perfectly natural and obvious for English speakers to think of addition as more than just the narrow definition of numeric addition.
Indeed it is, and indeed I do. You may notice, however, that I didn't use that argument.
If you're going to deny that standard, common understanding of "plus" as combining, and claim that it "just doesn't match up" in defiance of common practice, you ought to have a good reason why all those who refer to dict addition or adding dicts are wrong to do so.
As I'm sure you are aware, natural language is inherently ambiguous. Python is not a natural language. It is my contention that the operation of combining two data structures together that results in data loss should not be called "+". As it happens, the Python set() agrees:
Python 3.6.5 (default, Apr 1 2018, 05:46:30) [GCC 7.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. --> set([1, 2, 3]) + set([2, 3, 4]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'set' and 'set'
-- ~Ethan~
+1 on this, it seems very natural to me. I don’t mean to downplay the concerns people have, but in my experience teaching newbies, dictionaries take some time to wrap their heads around anyway. So yes, they may be confused when + removes data, but they’d be confused anyway :-) And it would be less confusing than: {**d1, **d2} That means pretty much nothing to a newbie, and even if they do get what ** means, it’s still some version of “put the contents of these two ducts together” — I can’t see how that is any less confusing than d1+d2. As for expecting it to be lossless like list addition — if you don’t understand that ducts can’t have duplicate keys, you’re don’t “get” dicts anyway.
The problem is that dicts are complex objects with two pieces of
information,
And they are with or without +, of course.
Even better, if we had two engineers (key) named Anita and Carolyn (values) and combined them into a group, do you expect one of them to vanish?
Then a dict is not the data structure in which to store this data, plain and simple. You don’t use a key like “engineer” if you might have more than one engineer! This is completely independent of syntax. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I am strong -1 on the proposal. The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect
{'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5}
However, the hypothetical behavior when different keys are present would not be obvious to me. Obviously I can think of several possible behaviors, but none are the "one obvious thing." What would NOT feel intuitive is the operation meaning .update(). I do not have any particular objection to the union operator '|' being used for this purpose that is far more similar to set union... But neither do I see any great need for the shortcut. On Sat, Oct 19, 2019, 1:43 PM Christopher Barker <pythonchb@gmail.com> wrote:
+1 on this, it seems very natural to me.
I don’t mean to downplay the concerns people have, but in my experience teaching newbies, dictionaries take some time to wrap their heads around anyway. So yes, they may be confused when + removes data, but they’d be confused anyway :-)
And it would be less confusing than:
{**d1, **d2}
That means pretty much nothing to a newbie, and even if they do get what ** means, it’s still some version of “put the contents of these two ducts together” — I can’t see how that is any less confusing than d1+d2.
As for expecting it to be lossless like list addition — if you don’t understand that ducts can’t have duplicate keys, you’re don’t “get” dicts anyway.
The problem is that dicts are complex objects with two pieces of
information,
And they are with or without +, of course.
Even better, if we had two engineers (key) named Anita and Carolyn (values) and combined them into a group, do you expect one of them to vanish?
Then a dict is not the data structure in which to store this data, plain and simple.
You don’t use a key like “engineer” if you might have more than one engineer!
This is completely independent of syntax.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2LDA5U... Code of Conduct: http://python.org/psf/codeofconduct/
On 19/10/2019 19:02, David Mertz wrote:
I am strong -1 on the proposal.
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect
{'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5}
That's only a natural expectation if you also expect the values in your dict to be addable (in the sense of doing something useful with a "+" operator). It never occurs to me to make that assumption because a fair amount of the time it isn't true in my code. -- Rhodri James *-* Kynesim Ltd
On Mon, Oct 21, 2019, 9:14 AM Rhodri James
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect
{'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5}
That's only a natural expectation if you also expect the values in your dict to be addable (in the sense of doing something useful with a "+" operator). It never occurs to me to make that assumption because a fair amount of the time it isn't true in my code.
I'm not arguing that we SHOULD make '+' mean recursive addition. I'm just saying that if I never read this discussion, then later read `dict1 + dict2` in code, that's what I'd expect. I think a large percentage of the code I work with would operates insert this hypothetical meaning. Values in my ducts are usually numbers, lists, tuples, or *other dicts*. If that last had this new behavior, things would mostly work. I don't usually write code with: d1 = {'purchases': [apples, bananas, pears]} d2 = {'purchases': 17} I could, of course. But lots of things I can do raise exceptions. Like `dct[mutable_var] = val` (yes, technically "unhashable"). This one I actually wind up encountering pretty often (or with sets even more). What is proposed in this PEP is to add a meaning for dct1+dct2 that would be well defined, but that would be DIFFERENT from the "one obvious meaning."
On 21/10/2019 14:54, David Mertz wrote:
On Mon, Oct 21, 2019, 9:14 AM Rhodri James
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect
{'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5} That's only a natural expectation if you also expect the values in your dict to be addable (in the sense of doing something useful with a "+" operator). It never occurs to me to make that assumption because a fair amount of the time it isn't true in my code.
I'm not arguing that we SHOULD make '+' mean recursive addition. I'm just saying that if I never read this discussion, then later read `dict1 + dict2` in code, that's what I'd expect.
And I'm just explaining why that's not what I expect. My code at the moment looks more like lookup_dict[remote_id] = RemoteObject(stuff) so the idea of adding dict values simply doesn't come to me. -- Rhodri James *-* Kynesim Ltd
On Mon, 21 Oct 2019 at 14:55, David Mertz <mertz@gnosis.cx> wrote:
What is proposed in this PEP is to add a meaning for dct1+dct2 that would be well defined, but that would be DIFFERENT from the "one obvious meaning."
For me, what's coming out is that there *is* no "obvious meaning". People expect different things. All the attempts to persuade people that they "should" expect something other than what they do is merely symptomatic of the basic fact, that not everyone expects the same thing from this operation. Whether "in time, you'll learn the meaning that Python assigns to dictionary +", and "the proposed meaning is useful in a number of contexts" are sufficiently true to counterbalance the fact that people assume different meanings when faced with the operator for the first time, is a different question - and one that's much harder to answer. But arguing over what's "obvious" in the face of people clearly stating that they have differing initial assumptions, seems like a waste of time to me. Let's just accept that the meaning of + on dictionaries, as defined by the PEP, is something that *at least some people* will not find immediately obvious, and will have to learn. Personally, I think the utility is marginal, the "obviousness" is context dependent, the inconsistencies with set union are a concern, and the fact that there are already (admittedly more convoluted) ways of doing this makes the proposal a minor win at best. So I'm probably -0 on the PEP. Paul PS Yes, every programming language construct is unfamiliar to someone completely new to programming. Yes, even keywords and method names are unfamiliar to people who are not native English speakers. Both of those points support my argument, so no-one should feel any particular need to make them here ;-)
Agreed -- Anyone who uses numpy a lot might make the same assumption, because numpy vectorizes. Anybody who doesn't might expect an extension-equivalent:
import numpy as np a1 = np.arange(2,6) a2 = np.arange(4,0,-1) a2 array([4, 3, 2, 1]) a1 array([2, 3, 4, 5]) a1 + a2 array([6, 6, 6, 6]) l1 = list(range(2,6)) l2 = list(range(4,0,-1) ... ) l1 + l2 [2, 3, 4, 5, 4, 3, 2, 1]
On Sat, Oct 19, 2019 at 02:02:43PM -0400, David Mertz wrote:
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect
{'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5}
Outside of Counter when would this behaviour be useful? I expect that this feels natural to you because you're thinking about simple (dare I say "toy"?) examples like the above, rather than practical use-cases like "merging multiple preferences": prefs = defaults + system_prefs + user_prefs # or if you prefer the alternative syntax prefs = defaults | system_prefs | user_prefs (Note that in this case, the lack of commutativity is a good thing: we want the last seen value to win.) Dicts are a key:value store, not a multiset, and outside of specialised subclasses like Counter, we can't expect that adding the values is meaningful or even possible. "Adding the values" is too specialised and not general enough for dicts, as a slightly less toy example might show: d = ({'customerID': 12932063, 'purchaseHistory': <Purchases object at 0xb7ce14d0>, 'name': 'Joe Consumer', 'rewardsID': 391187} + {'name': 'Jane Consumer', 'rewardsID': 445137} ) Having d['name'] to be 'Joe ConsumerJane Consumer' and d['rewardsID'] to be 836324 would be the very opposite of useful behaviour. -- Steven
On 10/21/19 9:51 AM, Steven D'Aprano wrote:
Dicts are a key:value store, not a multiset, and outside of specialised subclasses like Counter, we can't expect that adding the values is meaningful or even possible. "Adding the values" is too specialised and not general enough for dicts ...
Iterables are ordered collections of values, and outside of specialized subclasses, we can't expect that adding the values is meaningful or even possible. "Adding the values" is too specialized and not general enough for iterables. And yet the builtin function sum exists and works the way it does. Lately, I tend to reduce or fold my collections rather than to use symbolic operators, and to write specialized functions when it comes to use cases like preferences. I'd *almost* vote for | operating on the keys (because keys are like a set) and + for operating on values (because adding one key to another is more meaningless than adding arbitrary values), but that doesn't seem quite right, either.
On Mon, Oct 21, 2019 at 10:25:57AM -0500, Dan Sommers wrote:
Iterables are ordered collections of values, and outside of specialized subclasses, we can't expect that adding the values is meaningful or even possible.
Correct. That's why lists, tuples, strings, bytes and even arrays all define addition as concatentation rather than element-by-element addition. We leave it to specialised libraries and data types, like numpy, to implement element-by-element addition.
"Adding the values" is too specialized and not general enough for iterables.
Correct. That's why join is a specialised string method, rather than a method on more general lists.
And yet the builtin function sum exists and works the way it does.
Yes, because sum is a specialised function whose job is to sum the values of an iterable. What did you think it was, if it wasn't a specialised "sum these values" function? -- Steven
On Oct 21, 2019, at 09:29, Steven D'Aprano <steve@pearwood.info> wrote:
What did you think it was, if it wasn't a specialised "sum these values" function?
For what it’s worth, I initially thought it was a general fold function, using the magic of optional parameters and operator polymorphism to default to the most common reduction operation for a whole bunch of types. But it only took a few seconds to realize I was wrong. (Although I think this was added not long after new-style classes and the full dunder protocols, so I probably did have fun building a wrapper type that I could pass to sum and use it efficiently as a generic fold function against its will, it was pretty obvious that’s not what it was for.)
Steven D'Aprano wrote:
On Sat, Oct 19, 2019 at 02:02:43PM -0400, David Mertz wrote:
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect {'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5} Outside of Counter when would this behaviour be useful?
For example one could use dicts to represent data tables, with the keys being either indices or column names and the values being lists (rows or columns). Then for joining two such tables it would be desirable if values are added, because then you could simply do `joint_table = table1 + table2`. Or having a list of records from different sources: purchases_online = {'item1': [datetime1, datetime2, ...], 'item2': ...} purchases_store = {'item1': [datetime3, datetime4, ...], ...} purchases_overall = purchases_online + purchases_store # Records should be concatenated. # Then doing some analysis on the overall purchases. `pandas.Series` also behaves dict-like (almost) and does add the values on "+".
I expect that this feels natural to you because you're thinking about simple (dare I say "toy"?) examples like the above, rather than practical use-cases like "merging multiple preferences": prefs = defaults + system_prefs + user_prefs
# or if you prefer the alternative syntax prefs = defaults | system_prefs | user_prefs
(Note that in this case, the lack of commutativity is a good thing: we want the last seen value to win.)
In this case you'd have to infer the order of precedence from the variable names, not the "+" syntax itself. I.e. if you had spelled it `a + b + c` I would have no idea whether `a` or `c` has highest precedence. Compare that with a "directed" operator symbol (again, I'm not particularly arguing for "<<"): prefs = defaults << system_prefs << user_prefs Here it becomes immediately clear that `system_prefs` supersedes `defaults` and `user_prefs` supersedes the other two. A drawback of "+" here is that you can't infer this information from the syntax itself. Also I'm not sure if this is a good example, since in case something in `system_prefs` changes you'd have to recompute the whole thing (`prefs`), since you can't tell whether that setting was overwritten by `user_prefs`. I think in such a case it would be better to use `collections.ChainMap` for providing a hierarchy of preferences, which let's you easily update each level.
Dicts are a key:value store, not a multiset, and outside of specialised subclasses like Counter, we can't expect that adding the values is meaningful or even possible. "Adding the values" is too specialised and not general enough for dicts, as a slightly less toy example might show: d = ({'customerID': 12932063, 'purchaseHistory': <Purchases object at 0xb7ce14d0>, 'name': 'Joe Consumer', 'rewardsID': 391187} + {'name': 'Jane Consumer', 'rewardsID': 445137} )
Having d['name'] to be 'Joe ConsumerJane Consumer' and d['rewardsID'] to be 836324 would be the very opposite of useful behaviour.
I agree that adding the values doesn't make sense for that example but neither does updating the values. Why would you want to take a record corresponding to "Joe Consumer" and partially update it with data from another consumer ("Jane Consumer")? Actually I couldn't tell what the result of that example should be.
Steven D'Aprano wrote:
On Sat, Oct 19, 2019 at 02:02:43PM -0400, David Mertz wrote:
The plus operation on two dictionaries feels far more natural as a vectorised merge, were it to mean anything. E.g., I'd expect {'a': 5, 'b': 4} + {'a': 3, 'b': 1} {'a': 8, 'b': 5} Outside of Counter when would this behaviour be useful?
For example one could use dicts to represent data tables, with the keys being either indices or column names and the values being lists (rows or columns). Then for joining two such tables it would be desirable if values are added, because then you could simply do `joint_table = table1 + table2`. Or having a list of records from different sources: purchases_online = {'item1': [datetime1, datetime2, ...], 'item2': ...} purchases_store = {'item1': [datetime3, datetime4, ...], ...} purchases_overall = purchases_online + purchases_store # Records should be concatenated. # Then doing some analysis on the overall purchases. `pandas.Series` also behaves dict-like (almost) and does add the values on "+".
I expect that this feels natural to you because you're thinking about simple (dare I say "toy"?) examples like the above, rather than practical use-cases like "merging multiple preferences": prefs = defaults + system_prefs + user_prefs
# or if you prefer the alternative syntax prefs = defaults | system_prefs | user_prefs
(Note that in this case, the lack of commutativity is a good thing: we want the last seen value to win.)
In this case you'd have to infer the order of precedence from the variable names, not the "+" syntax itself. I.e. if you had spelled it `a + b + c` I would have no idea whether `a` or `c` has highest precedence. Compare that with a "directed" operator symbol (again, I'm not particularly arguing for "<<"): prefs = defaults << system_prefs << user_prefs Here it becomes immediately clear that `system_prefs` supersedes `defaults` and `user_prefs` supersedes the other two. A drawback of "+" here is that you can't infer this information from the syntax itself. Also I'm not sure if this is a good example, since in case something in `system_prefs` changes you'd have to recompute the whole thing (`prefs`), since you can't tell whether that setting was overwritten by `user_prefs`. I think in such a case it would be better to use `collections.ChainMap` for providing a hierarchy of preferences, which let's you easily update each level.
Dicts are a key:value store, not a multiset, and outside of specialised subclasses like Counter, we can't expect that adding the values is meaningful or even possible. "Adding the values" is too specialised and not general enough for dicts, as a slightly less toy example might show: d = ({'customerID': 12932063, 'purchaseHistory': <Purchases object at 0xb7ce14d0>, 'name': 'Joe Consumer', 'rewardsID': 391187} + {'name': 'Jane Consumer', 'rewardsID': 445137} )
Having d['name'] to be 'Joe ConsumerJane Consumer' and d['rewardsID'] to be 836324 would be the very opposite of useful behaviour.
I agree that adding the values doesn't make sense for that example but neither does updating the values. Why would you want to take a record corresponding to "Joe Consumer" and partially update it with data from another consumer ("Jane Consumer")? Actually I couldn't tell what the result of that example should be.
Definitely want this functionality. I'd ultimately be happy with any of the top operator spellings, with tradeoffs: + Is more obvious, especially to newcomers, consistent w. seq | Is more accurate (None or '') --> '', consistent w. set << Could learn it easily and says "clobber" to me The lossy-ness argument as a problem may slightly overstated. With the current + only the resulting copy might lose information. With += is it potentially lossy on the original, where it is expected that the operation will mutate somehow. Believe as the PEP alludes, that most folks would expect a last-wins clobber (the most common use case), and for the folks that don't, one only has to try it once at the REPL to figure it out. Personally, I'm leaning towards the spelling of "|" but would much rather have "+" than nothing. -Mike
On Fri, Oct 18, 2019 at 01:32:55PM -0700, Ethan Furman wrote:
On 10/18/2019 10:25 AM, Steven D'Aprano wrote:
On Fri, Oct 18, 2019 at 09:17:54AM -0700, Ethan Furman wrote:
That result just doesn't match up with the '+' operator.
Why not?
Pretty sure I answered that question in my OP. Here it is again since you conveniently stripped it out:
I stripped it out because it doesn't answer the question. Restating it word for word doesn't help. Given the large number of meanings that we give the + symbol, what is it specifically about merging two dicts (mappings) that doesn't match the plus symbol? Which of the many uses of plus are you referring to? Just stating, as you did, that + doesn't match dict merging is begging the question: it doesn't match because we haven't defined the + operator to mean merge. If we define the + operator to mean merge, then it will match, by definition. For example, in Groovy and Kotlin, we can say that dict merging matches + because that's the symbol they already use for dict merging. Likewise, we can say the same thing about Python: Counter already uses + to merge Counters. If you want to argue against plus, you need a better argument than "it doesn't match" since plus has many different uses and it's not even clear what "doesn't match" means. Of course there are differences between dict merging and numeric addition, just as there are differences between numeric addition and concatenation. But there are also similarities, which is why so many people immediately think of merging two dicts as an obvious kind of addition, just as they think of concatenating two strings/lists as a kind of addition. [...]
Before answering, please check the PEP to see if your objection has already been raised.
What do you know, it had been!
Gosh, anyone would think that I had spent many, many hours crawling through Python-Ideas threads gathering arguments for and against the proposal before writing the PEP. *wink*
PEP 584: ------- Dict addition is lossy Dict addition can lose data (values may disappear); no other form of addition is lossy.
Response:
It isn't clear why the first part of this argument is a problem. dict.update() may throw away values, but not keys; that is expected behavior, and will remain expected behavior regardless of whether it is spelled as update() or +.
It's a problem because the method is named "update", not "add", "join", or "combine".
Sorry, I still don't see why this is a *problem*. What badness do you see happening from it? In the context of dicts, "update" is an obvious synonym for "add, join, combine, merge" etc. If you update a set of records, you add the new records to the old records, with the new records taking priority. Depending on the implementation, an update might even be a pure concatenation, with the old records still there but inaccessible.
The dict "+" operator is being turned into a deduplication operator - as if, for example:
--> [1, 2, 3] + [3, 4, 5] [1, 2, 3, 4, 5] # not what happens
Right, because list addition is concatentation, not set union. Just like this is not a problem: 1234 + 5678 --> 12345678 # not what happens Numeric addition is not concatenation (except in unary (base 1) number systems). Nor is it set union.
The second definition of "add" in Webster's dictionary is "to join or unite", which matches dict merging very well.
No, it doesn't -- not in the case of duplicate keys. If we had two people named "Steve" and joined them into a group, would you expect one of them to just disappear?
Not people, no, since people are compared by identity, not personal name. But we're not talking about adding *people*. If you add the bitsets 0b1001 and 0b101, the least significant bit just disappears, giving 0b1110. Did you expect that bits (or decimal digits) are conserved by addition? Of course you don't. So why do you expect dict addition to conserve values? (By the way, dict addition will conserve keys.)
Even better, if we had two engineers (key) named Anita and Carolyn (values) and combined them into a group, do you expect one of them to vanish?
You're obviously very privileged to have never experienced a merger between two companies or two departments where a whole lot of people are made redundant due to their positions now being duplicated. If you use the position "engineer" as key, you are *requiring* that there is only a single engineer at a time. If you want two engineers, you need to use some other key. [...]
As I'm sure you are aware, natural language is inherently ambiguous. Python is not a natural language. It is my contention that the operation of combining two data structures together that results in data loss should not be called "+". As it happens, the Python set() agrees:
Set union isn't lossy. It never throws away an element. It might not be the same *object* as the original, but it will still be equal. The same occurs with dict merging: no key will be thrown away. I wasn't paying attention back in 2.3 or so when sets were introduced, but I expect that the main driving force for using & and | for set intersection and union over * and + was familiarity with bitwise intersection and union from C. In mathematics, the most common notation for sets is ∪ and ∩ but you do occassionally find people using + and ⋅ (the dot operator) due to the close connection between set operations and Boolean algebra operations, where union and intersection are frequently spelled as "addition" and "multiplication". Sometimes they use + to mean disjoint union, which is even further away from the naive "addition as plussing numbers" than regular old union. But as far as it goes, using | instead of + for dicts is a viable choice, especially if you want to argue for dicts to offer the full set API of intersection, difference and symmetric difference as well. -- Steven
On 2019-10-21 01:18, Steven D'Aprano wrote:
On Fri, Oct 18, 2019 at 01:32:55PM -0700, Ethan Furman wrote:
On 10/18/2019 10:25 AM, Steven D'Aprano wrote:
On Fri, Oct 18, 2019 at 09:17:54AM -0700, Ethan Furman wrote:
That result just doesn't match up with the '+' operator.
Why not?
Pretty sure I answered that question in my OP. Here it is again since you conveniently stripped it out:
I stripped it out because it doesn't answer the question. Restating it word for word doesn't help.
Given the large number of meanings that we give the + symbol, what is it specifically about merging two dicts (mappings) that doesn't match the plus symbol? Which of the many uses of plus are you referring to?
Just stating, as you did, that + doesn't match dict merging is begging the question: it doesn't match because we haven't defined the + operator to mean merge. If we define the + operator to mean merge, then it will match, by definition. For example, in Groovy and Kotlin, we can say that dict merging matches + because that's the symbol they already use for dict merging. Likewise, we can say the same thing about Python: Counter already uses + to merge Counters.
[snip] In the case of Counter, it supports both + and |. The proposed operator will replace where there are matching keys. Which operator of Counter does that? Neither. BTW, my preference is for |.
I don't understand the obsession with Counter here. IMO it's a prime example of sharing implementation through inheritance, an antipattern. It implements the Mapping (i.e. read-only) interface fine, but while it has all the methods of the MutableMapping interface, the behavior is sufficiently different that you shouldn't pass it to code that was written with a MutableMapping or dict in mind. It would have made more sense to design an API specifically for Counter, and have the implementation use a dict internally to hold the values (the "composition over inheritance" pattern). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Mon, Oct 21, 2019 at 02:02:07AM +0100, MRAB wrote:
In the case of Counter, it supports both + and |.
The proposed operator will replace where there are matching keys. Which operator of Counter does that? Neither.
That's okay. That's what subclasses are for: to support specialised behaviour. -- Steven
On 2019-10-21 02:29, Steven D'Aprano wrote:
On Mon, Oct 21, 2019 at 02:02:07AM +0100, MRAB wrote:
In the case of Counter, it supports both + and |.
The proposed operator will replace where there are matching keys. Which operator of Counter does that? Neither.
That's okay. That's what subclasses are for: to support specialised behaviour.
I've just realised why I prefer | over +: | is used with sets, which might overlap, so len(x | y) <= len(x) + len(y) + is used with lists and strings, which don't overlap, so len(x + y) == len(x) + len(y) On that basis, dicts are more like sets, IMHO.
I like the proposal of adding an operator but I dislike the usage of "+". I'd expect this to do a recursive merge on the dict values for duplicate keys (i.e. adding the values), even more so since `Counter` (being a subclass of dict) already has that behavior. I understand that "+" is meant as a shorthand for `update` and this is what `Counter` does but what sticks more to the mind is the resulting behavior. Furthermore, since this operation is potentially lossy, I think it would be helpful if the associated operator is not a symmetric symbol but instead is explicit about which operand takes precedence for conflicting keys. The lshift "<<" operator, for example, does have this property. It would be pretty clear what this means `a << b`: take the items of "b" and put them into "a" (or a copy thereof, overwriting what's already there) in order to create the result. The PEP mentions lack of interest in this operator though, as well as:
The "cuteness" value of abusing the operator to indicate information flow got old shortly after C++ did it.
I think a clear advantage of "<<" over "+" is that it indicates the direction (or precedence) which is important if items are potentially to be overwritten. I'd say "old but gold". In the section about [Dict addition is lossy](https://www.python.org/dev/peps/pep-0584/#dict-addition-is-lossy) you write that "no other form of addition is lossy". This is true for the builtin types (except for floating point accuracy) but as part of the stdlib we have `collections.deque` which supports "+" and can be lossy if it specifies `maxlen`. For example: >>> d1 = deque([1, 2], maxlen=3) >>> d2 = deque([3, 4]) >>> d1 + d2 deque([2, 3, 4], maxlen=3) I think this is unfortunate especially since as a double ended queue it supports both `extend` and `extendleft`, so it's not clear whether this extends d1 by d2 or left-extends d2 by d1 (though the latter would probably be ambiguous about the order of items appended). Usage of `d1 << d2` on the other hand would be explicit and clear about the direction of data flow. Although a bit different for dicts, it would as well indicate which of the operands takes precedence over the other.
On Sat, Oct 19, 2019 at 3:14 PM Dominik Vilsmeier <dominik.vilsmeier@gmx.de> wrote:
I like the proposal of adding an operator but I dislike the usage of "+". I'd expect this to do a recursive merge on the dict values for duplicate keys (i.e. adding the values), even more so since `Counter` (being a subclass of dict) already has that behavior.
I think that's actually a counter argument (ha!) -- since there IS a special "counter" type, why would anyone expect the regular dict to act that way? Also, that behavior only makes sense for particular dicts -- it really is a special case, perfect for a dict subclass (you know, maybe call it Counter), but not for generic dict behavor.
I think it would be helpful if the associated operator is not a symmetric symbol but instead is explicit about which operand takes precedence for conflicting keys. The lshift "<<" operator, for example, does have this property. It would be pretty clear what this means `a << b`:
well, maybe. but I think there are two ways of thinking about "intuitive" 1) if someone sees this code, will they be right in knowing what it means? (Readability) 2) if someone want to do something? Will they think to try this? (Discoverability) So << might be more intuitive from a readability perspective, but less discoverable. Note in this discussion (particularly the previous long one) that apparently newbies often expect to be able to add dicts. That being said, << or | is a lot better than adding yet another operator.
take the items of "b" and put them into "a" (or a copy thereof, overwriting what's already there) in order to create the result. The PEP mentions lack of interest in this operator though, as well as:
The "cuteness" value of abusing the operator to indicate information flow got old shortly after C++ did it.
I think a clear advantage of "<<" over "+" is that it indicates the direction (or precedence) which is important if items are potentially to be overwritten. I'd say "old but gold".
In the section about [Dict addition is lossy]( https://www.python.org/dev/peps/pep-0584/#dict-addition-is-lossy) you write that "no other form of addition is lossy". This is true for the builtin types (except for floating point accuracy) but as part of the stdlib we have `collections.deque` which supports "+" and can be lossy if it specifies `maxlen`. For example:
>>> d1 = deque([1, 2], maxlen=3) >>> d2 = deque([3, 4]) >>> d1 + d2 deque([2, 3, 4], maxlen=3)
I think this is unfortunate especially since as a double ended queue it supports both `extend` and `extendleft`, so it's not clear whether this extends d1 by d2 or left-extends d2 by d1 (though the latter would probably be ambiguous about the order of items appended). Usage of `d1 << d2` on the other hand would be explicit and clear about the direction of data flow. Although a bit different for dicts, it would as well indicate which of the operands takes precedence over the other. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6A3DW3... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
So the choice is really only three way. 1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing We're not going to introduce a brand new operator for this purpose, nor are we going to use a different existing operator. The asymmetry of the operation (in case there are matching keys with conflicting values) doesn't bother me, nor does the behavior of Counter affect how I feel about this. The += or |= operator will have to behave identical to d1.update(d2) when it comes to matching keys. I'm not sure whether += or |= needs to be an exact alias for dict.update. For lists, += and .extend() behave identically: both accept arbitrary sequences as right argument. But for sets, |= requires the right argument to be a set, while set.update() does not. (The not-in-place operators always require matching types: l1 + l2 requires l2 to be a list, s1 | s2 requires s2 to be a set.) But this is only a second-order consistency issue -- we should probably just follow the operator we're choosing in the end, either + or |. IMO the reason this is such a tough choice is that Python learners are typically introduced to list and dict early on, while sets are introduced later. However, the tutorial on docs.python.org covers sets before dicts -- but lists are covered much earlier, and dicts make some cameo appearances in the section on control flow. Perhaps more typical, the tutorial at https://www.tutorialspoint.com/python/ discusses data types in this order: numbers, strings, lists, tuples, dictionary, date&time -- it doesn't mention sets at all. This matches Python's historical development (sets weren't added until Python 2.3). So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.) In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On 10/20/2019 03:06 PM, Guido van Rossum wrote:
So the choice is really only three way.
1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing
In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
One of the things I really enjoy about Python is its consistency: - dicts and sets are both hash tables - dicts and sets both disallow duplicates - dicts and sets both use .update() Adding '|' to dict to be consistent with '|' on sets seems a reasonable, and not a foolish, consistency. +1 on '|' and '|=' -- ~Ethan~
Guido van Rossum wrote:
So the choice is really only three way. 1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing We're not going to introduce a brand new operator for this purpose, nor are we going to use a different existing operator.
I didn't mean to argue for another operator, but rather to point out that i.m.o. "+" is not a good choice, for similar reasons why I think `collections.deque` shouldn't support "+" (regarding ambiguity of precedence and potential "loss" of data). Besides for `dict` even more interpretations of the meaning of "+" are plausible. Regarding "|" operator, I think a drawback is the resemblance with "or" (after all it's associated with "__or__") so people might assume behavior similar to `x or y` where `x` takes precedence (for truthy values of `x`). So when reading `d1 | d2` one could falsely assume that values in `d1` take precedence over the ones in `d2` for conflicting keys. And this is also the existing `set` behavior (though it's not really relevant in this case): >>> class Test: ... def __init__(self, x): ... self.x = x ... def __hash__(self): ... return 0 ... def __eq__(self, other): ... return True ... >>> s = {Test(1)} | {Test(2)} >>> s.pop().x # leftmost wins. 1
The asymmetry of the operation (in case there are matching keys with conflicting values) doesn't bother me, nor does the behavior of Counter affect how I feel about this. The += or |= operator will have to behave identical to d1.update(d2) when it comes to matching keys. I'm not sure whether += or |= needs to be an exact alias for dict.update. For lists, += and .extend() behave identically: both accept arbitrary sequences as right argument. But for sets, |= requires the right argument to be a set, while set.update() does not. (The not-in-place operators always require matching types: l1 + l2 requires l2 to be a list, s1 | s2 requires s2 to be a set.) But this is only a second-order consistency issue -- we should probably just follow the operator we're choosing in the end, either + or |. IMO the reason this is such a tough choice is that Python learners are typically introduced to list and dict early on, while sets are introduced later. However, the tutorial on docs.python.org covers sets before dicts -- but lists are covered much earlier, and dicts make some cameo appearances in the section on control flow. Perhaps more typical, the tutorial at https://www.tutorialspoint.com/python/ discusses data types in this order: numbers, strings, lists, tuples, dictionary, date&time -- it doesn't mention sets at all. This matches Python's historical development (sets weren't added until Python 2.3). So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.) In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
On Sun, Oct 20, 2019 at 11:48:10PM -0000, Dominik Vilsmeier wrote:
Regarding "|" operator, I think a drawback is the resemblance with "or" (after all it's associated with "__or__") so people might assume behavior similar to `x or y` where `x` takes precedence (for truthy values of `x`). So when reading `d1 | d2` one could falsely assume that values in `d1` take precedence over the ones in `d2` for conflicting keys. And this is also the existing `set` behavior (though it's not really relevant in this case):
There's a much easier way to demonstrate what you did: >>> {1} | {1.0} {1} In any case, dict.update already has this behaviour: >>> d = {1: 'a'} >>> d.update({1.0: 'A'}) >>> d {1: 'A'} The existing key is kept, only the value is changed. The PEP gives a proposed implementation, which if I remember correctly is: # d1 | d2 d = d1.copy() d.update(d2) so it will keep the current dict behaviour: - keys are stable (first key seen wins) - values are updated (last value seen wins) I think that, strictly speaking, this "keys are stable" behaviour is not guaranteed by the language reference. But it's probably so deeply built into the implementation of dicts that it is unlike to ever change. (I think Guido mentioned something about it being a side-effect of the way dict `__setitem__` works?) -- Steven
Steven D'Aprano wrote:
On Sun, Oct 20, 2019 at 11:48:10PM -0000, Dominik Vilsmeier wrote:
Regarding "|" operator, I think a drawback is the resemblance with "or" (after all it's associated with "__or__") so people might assume behavior similar to x or y where x takes precedence (for truthy
values of x). So when reading d1 | d2 one could falsely assume that values in d1 take precedence over the ones in d2 for conflicting keys. And this is also the existing set behavior (though it's not really relevant in this case): There's a much easier way to demonstrate what you did:
{1} | {1.0} {1}
In any case, dict.update already has this behaviour:
d = {1: 'a'} d.update({1.0: 'A'}) d {1: 'A'}
The existing key is kept, only the value is changed. The PEP gives a proposed implementation, which if I remember correctly is: # d1 | d2 d = d1.copy() d.update(d2)
so it will keep the current dict behaviour:
keys are stable (first key seen wins) values are updated (last value seen wins)
Exactly, so the dict "+" behavior would match the set "|" behavior, preserving the keys. But how many users will be concerned about whether the keys are going to be preserved? I guess almost everybody will want to know what happens with the values, and that question remains unanswered by just looking at the "+" or "|" syntax. It's reasonable to assume that values are preserved as well, i.e. `d1 + d2` adds the missing keys from `d2` to `d1`. Of course, once you know that "+" is actually similar to "update" you can infer that the last value wins. But "+" simply doesn't read "update". So in order to know you'll have to look it up, but following that argument you could basically settle on any operator symbol for the update operation. A drawback of "+" is that different interpretations are plausible, and this fact cannot be denied as can be seen from the ongoing discussion. Of course one can blame the programmer, if they didn't check the documentation carefully enough, also since "in the face of ambiguity, refuse the temptation to guess". But in the end the language should assist the programmer and it's better not to introduce ambiguity in the first place.
I think that, strictly speaking, this "keys are stable" behaviour is not guaranteed by the language reference. But it's probably so deeply built into the implementation of dicts that it is unlike to ever change. (I think Guido mentioned something about it being a side-effect of the way dict __setitem__ works?)
On 21/10/2019 21:14, Dominik Vilsmeier wrote:
Exactly, so the dict "+" behavior would match the set "|" behavior, preserving the keys. But how many users will be concerned about whether the keys are going to be preserved? I guess almost everybody will want to know what happens with the values, and that question remains unanswered by just looking at the "+" or "|" syntax. It's reasonable to assume that values are preserved as well, i.e. `d1 + d2` adds the missing keys from `d2` to `d1`. Of course, once you know that "+" is actually similar to "update" you can infer that the last value wins.
There's one reason for + which I feel is being missed (though I think someone may have briefly mentioned it last time this topic was brought up): If we look at the behaviour of dict literals, adding two dicts actually behaves like concatenation in the sense that {"key1": "val1", "key2": "val2", "key1": "val3"} == {"key1": "val3", "key2": "val2"} which is exactly what we would get by adding {"key1": "val1", "key2": "val2"} and {"key1": "val3"} so using + we would actually have {"key1": "val1", "key2": "val2"} + {"key1": "val3"} == {"key1": "val1", "key2": "val2", "key1": "val3"}
On Tue, Oct 22, 2019 at 12:03:02AM +0200, Jan Greis wrote:
There's one reason for + which I feel is being missed (though I think someone may have briefly mentioned it last time this topic was brought up): If we look at the behaviour of dict literals, adding two dicts actually behaves like concatenation in the sense that
{"key1": "val1", "key2": "val2", "key1": "val3"} == {"key1": "val3", "key2": "val2"}
That's in the PEP. https://www.python.org/dev/peps/pep-0584/#id26 -- Steven
On Tue, Oct 22, 2019 at 12:05 AM Jan Greis <jan.r.greis@gmail.com> wrote:
Exactly, so the dict "+" behavior would match the set "|" behavior,
On 21/10/2019 21:14, Dominik Vilsmeier wrote: preserving the keys. But how many users will be concerned about whether the keys are going to be preserved? I guess almost everybody will want to know what happens with the values, and that question remains unanswered by just looking at the "+" or "|" syntax. It's reasonable to assume that values are preserved as well, i.e. `d1 + d2` adds the missing keys from `d2` to `d1`. Of course, once you know that "+" is actually similar to "update" you can infer that the last value wins.
There's one reason for + which I feel is being missed (though I think someone may have briefly mentioned it last time this topic was brought up): If we look at the behaviour of dict literals, adding two dicts actually behaves like concatenation in the sense that
{"key1": "val1", "key2": "val2", "key1": "val3"} == {"key1": "val3", "key2": "val2"}
which is exactly what we would get by adding {"key1": "val1", "key2": "val2"} and {"key1": "val3"}
It is not a "concatenation" though, because you lost {"key1": "val1"} in the process. The concatenation is not _just_ "writing something after something", you can do it with anything, but the actual operation, producing the result. Richard
so using + we would actually have
{"key1": "val1", "key2": "val2"} + {"key1": "val3"} == {"key1": "val1", "key2": "val2", "key1": "val3"} _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IRU66K... Code of Conduct: http://python.org/psf/codeofconduct/
On 22/10/2019 06:43, Richard Musil wrote:
It is not a "concatenation" though, because you lost {"key1": "val1"} in the process. The concatenation is not _just_ "writing something after something", you can do it with anything, but the actual operation, producing the result.
My point is that if I saw {"key1": "val1", "key2": "val2"} + {"key1": "val3"}, I would expect that it would be equivalent to {"key1": "val1", "key2": "val2", "key1": "val3"}. Similarly, I would expect that deque([1, 2, 3], maxlen=4) + deque([4, 5]) == deque([1, 2, 3, 4, 5], maxlen=4) == deque([2, 3, 4, 5], maxlen=4) which indeed is true.
Jan Greis wrote:
On 22/10/2019 06:43, Richard Musil wrote:
It is not a "concatenation" though, because you lost {"key1": "val1"} in the process. The concatenation is not _just_ "writing something after something", you can do it with anything, but the actual operation, producing the result. My point is that if I saw {"key1": "val1", "key2": "val2"} + {"key1": "val3"}, I would expect that it would be equivalent to {"key1": "val1", "key2": "val2", "key1": "val3"}.
But that reasoning only works with literals. And chances are that you're not going to see something like this in real code. Because why would you add two dict literals? Instead you're going to see something like this: `d1 + d2`. And if one has to infer the details of that operation by coming up with some hypothetical example involving literals, that doesn't speak in favor of the syntax. As mentioned, here it is up to the variable names to be clear about what happens. E.g. default_preferences + user_preferences For that example it's pretty clear that `user_preferences` is meant to supersede `default_preferences`. But variable names might not always be completely clear or even if they are, they might not allow the reader to infer any precedence. And then, "in the face of [that] ambiguity", one has to "refuse the temptation to guess". Maybe it's better not to introduce that ambiguity in the first place.
Similarly, I would expect that deque([1, 2, 3], maxlen=4) + deque([4, 5]) == deque([1, 2, 3, 4, 5], maxlen=4) == deque([2, 3, 4, 5], maxlen=4) which indeed is true.
Guido van Rossum wrote:
So the choice is really only three way. 1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing We're not going to introduce a brand new operator for this purpose, nor are we going to use a different existing operator.
I didn't mean to argue for another operator, but rather to point out that i.m.o. "+" is not a good choice, for similar reasons why I think `collections.deque` shouldn't support "+" (regarding ambiguity of precedence and potential "loss" of data). Besides for `dict` even more interpretations of the meaning of "+" are plausible. Regarding "|" operator, I think a drawback is the resemblance with "or" (after all it's associated with "__or__") so people might assume behavior similar to `x or y` where `x` takes precedence (for truthy values of `x`). So when reading `d1 | d2` one could falsely assume that values in `d1` take precedence over the ones in `d2` for conflicting keys. And this is also the existing `set` behavior (though it's not really relevant in this case): >>> class Test: ... def __init__(self, x): ... self.x = x ... def __hash__(self): ... return 0 ... def __eq__(self, other): ... return True ... >>> s = {Test(1)} | {Test(2)} >>> s.pop().x # leftmost wins. 1
The asymmetry of the operation (in case there are matching keys with conflicting values) doesn't bother me, nor does the behavior of Counter affect how I feel about this. The += or |= operator will have to behave identical to d1.update(d2) when it comes to matching keys. I'm not sure whether += or |= needs to be an exact alias for dict.update. For lists, += and .extend() behave identically: both accept arbitrary sequences as right argument. But for sets, |= requires the right argument to be a set, while set.update() does not. (The not-in-place operators always require matching types: l1 + l2 requires l2 to be a list, s1 | s2 requires s2 to be a set.) But this is only a second-order consistency issue -- we should probably just follow the operator we're choosing in the end, either + or |. IMO the reason this is such a tough choice is that Python learners are typically introduced to list and dict early on, while sets are introduced later. However, the tutorial on docs.python.org covers sets before dicts -- but lists are covered much earlier, and dicts make some cameo appearances in the section on control flow. Perhaps more typical, the tutorial at https://www.tutorialspoint.com/python/ discusses data types in this order: numbers, strings, lists, tuples, dictionary, date&time -- it doesn't mention sets at all. This matches Python's historical development (sets weren't added until Python 2.3). So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.) In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
On Sun, Oct 20, 2019, 6:09 PM Guido van Rossum
In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
While my "vote" is and should be much less significant than Guido's, I've explained why my initial expectation for dict+dict would NOT be the proposed behavior, but rather something "vectorized" similar to Counter. Overall, I am +0.5 on do nothing, +0 on |, but -1 on +.
On Sun, Oct 20, 2019 at 03:06:16PM -0700, Guido van Rossum wrote:
So the choice is really only three way.
1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing
Are you saying a method is a non-starter? A method won't satisfy those who prefer an operator, but it otherwise has a number of advantages, and few (that I can see) disadvantages. I think your analysis here:
IMO the reason this is such a tough choice is that Python learners are typically introduced to list and dict early on, while sets are introduced later. [...] So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.)
is excellent, and I think I shall steal it for the PEP :-) -- Steven
On Sun, Oct 20, 2019 at 6:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Oct 20, 2019 at 03:06:16PM -0700, Guido van Rossum wrote:
So the choice is really only three way.
1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing
Are you saying a method is a non-starter?
A method won't satisfy those who prefer an operator, but it otherwise has a number of advantages, and few (that I can see) disadvantages.
It would be like the Judgment of Solomon though. A method is not enough to change the dict API -- I'd rank it below "do nothing".
I think your analysis here:
IMO the reason this is such a tough choice is that Python learners are typically introduced to list and dict early on, while sets are introduced later. [...] So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.)
is excellent, and I think I shall steal it for the PEP :-)
You're welcome. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Mon, Oct 21, 2019 at 7:07 AM Guido van Rossum <guido@python.org> wrote:
So the choice is really only three way.
So if we want to cater to what most beginners will know, + and += would be the best choice. But if we want to be more future-proof and consistent, | and |= are best -- after all dicts are closer to sets (both are hash tables) than to lists. (I know you can argue that dicts are closer to lists because both support __getitem__ -- but I find that similarity shallower than the hash table nature.)
In the end I'm +0.5 on | and |=, +0 on + and +=, and -0 on doing nothing.
If we choose `+`, `+` is now "merging two containers", not just "concatenate two sequences". So it looks very inconsistent that set uses `|` instead of `+`. This inconsistency looks very ugly to me. How do you feel about this? I think we should add + to set too. Regards, -- Inada Naoki <songofacandy@gmail.com>
On 21 Oct 2019, at 00:08, Guido van Rossum <guido@python.org> wrote:
So the choice is really only three way.
1) Add d1 + d2 and d1 += d2 (using similarity with list + and +=) 2) Add d1 | d2 and d1 |= d2 (similar to set | and |=) 3) Do nothing
Isn't there 4) add .merged()? / Anders
Christopher Barker wrote: > On Sat, Oct 19, 2019 at 3:14 PM Dominik Vilsmeier dominik.vilsmeier@gmx.de > wrote: > > I like the proposal of adding an operator but I > > dislike the usage of "+". > > I'd expect this to do a recursive merge on the dict values for duplicate > > keys (i.e. adding the values), even more so since Counter (being a > > subclass of dict) already has that behavior. > > I think that's actually a counter argument (ha!) -- since there IS a > special "counter" type, why would anyone expect the regular dict to act > that way? The question is, why would someone who has experience with adding counters but never felt the need to add dicts, assume that this behavior is specialized in `Counter` and not inherited by `dict`. Maybe at some point they'll encounter a scenario where they need to recursive-merge (the `Counter` style) two dicts and then they might assume that they just need to add the dicts since they're familiar with this behavior from `Counter` and `Counter` subclasses `dict` so it's reasonable to assume this behavior is inherited. > Also, that behavior only makes sense for particular dicts -- it really is a > special case, perfect for a dict subclass (you know, maybe call it > Counter), but not for generic dict behavor. Maybe from a language design point of view, but a user might not be aware that this behavior is too specialized for generic dict. Besides `Counter`, `pandas` is another prominent example that uses the recursive merge strategy for mapping-like types (not necessarily in the collections.abc sense but exposing a similar interface): >>> s1 = pd.Series([[0, 1], 2]) >>> s2 = pd.Series([[3, 4], 5]) >>> s1 + s2 0 [0, 1, 3, 4] 1 7 Someone who is familiar with these types is probably used to that behavior and so it's easy to assume that it originates from dict. And even if they think it's too specialized, so `dict` must be doing something else, how obvious is the conclusion that dict performs a shallow merge and resolves conflicting keys by giving precedence to the r.h.s. operand? > > I think it would be helpful if the associated > > operator is not a symmetric > > symbol but instead is explicit about which operand takes precedence for > > conflicting keys. The lshift "<<" operator, for example, does have this > > property. It would be pretty clear what this means a << b: > > well, maybe. but I think there are two ways of thinking about "intuitive" > 1) if someone sees this code, will they be right in knowing what it means? > (Readability) > 2) if someone want to do something? Will they think to try this? > (Discoverability) > So << might be more intuitive from a readability perspective, but less > discoverable. > Note in this discussion (particularly the previous long one) that > apparently newbies often expect to be able to add dicts. > That being said, << or | is a lot better than adding yet another operator. I wasn't arguing particularly for the "<<" operator, I wanted to pointed out why, i.m.o., the "+" operator, as a symmetric symbol, isn't an ideal choice by comparing it to a non-symmetric operator symbol. I agree that "+" is likely more discoverable. Regarding intuition, as you pointed out, it's a two-way relationship: <operator> <--> <action>. So if someone wants to perform <action> it should be intuitive to think of <operator> (discoverability) and if someone reads <operator> it should be intuitive to associate it with <action> (readability, interpretability); I think "+" isn't very good at the latter. > > take the items of "b" and put them into "a" (or a > > copy thereof, > > overwriting what's already there) in order to create the result. The PEP > > mentions lack of interest in this operator though, as well as: > > The "cuteness" value of abusing the operator to > > indicate information > > flow got old shortly after C++ did it. > > I think a clear advantage of "<<" over "+" is that it indicates the > > direction (or precedence) which is important if items are potentially to be > > overwritten. I'd say "old but gold". > > In the section about Dict addition is > > lossy you > > write that "no other form of addition is lossy". This is true for the > > builtin types (except for floating point accuracy) but as part of the > > stdlib we have collections.deque which supports "+" and can be lossy if > > it specifies maxlen. For example: > > >>> d1 = deque([1, 2], maxlen=3) > > >>> d2 = deque([3, 4]) > > >>> d1 + d2 > > deque([2, 3, 4], maxlen=3) > > > > I think this is unfortunate especially since as a double ended queue it > > supports both extend and extendleft, so it's not clear whether > > this > > extends d1 by d2 or left-extends d2 by d1 (though the latter would probably > > be ambiguous about the order of items appended). Usage of d1 << d2 on > > the > > other hand would be explicit and clear about the direction of data flow. > > Although a bit different for dicts, it would as well indicate which of the > > operands takes precedence over the other. > > > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-leave@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > > https://mail.python.org/archives/list/python-ideas@python.org/message/6A3DW3... > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > Christopher Barker, PhD > Python Language Consulting > > Teaching > Scientific Software Development > Desktop GUI and Web Development > wxPython, numpy, scipy, Cython > Christopher Barker, PhD > > Python Language Consulting > > Teaching > Scientific Software Development > Desktop GUI and Web Development > wxPython, numpy, scipy, Cython
On Sun, Oct 20, 2019 at 11:29:54PM -0000, Dominik Vilsmeier wrote:
The question is, why would someone who has experience with adding counters but never felt the need to add dicts, assume that this behavior is specialized in `Counter` and not inherited by `dict`.
I think you might mean inherited *from* dict? dict doesn't inherit Counter's behaviour because Counter is the subclass and dict the parent class.
Maybe at some point they'll encounter a scenario where they need to recursive-merge (the `Counter` style) two dicts and then they might assume that they just need to add the dicts
Okay. So what? If they do this, it will be a mistake. Programmers make mistakes thousands of times a day, it is neither our responsibility nor within our power to prevent them all. Programmer error is not a good reason to reject a feature.
since they're familiar with this behavior from `Counter` and `Counter` subclasses `dict` so it's reasonable to assume this behavior is inherited.
No it isn't reasonable. Counters are designed to *count*. Their values are supposed to be ints, usually positive ints. dicts are general key:value stores where the values can be any kind of object at all, not just numbers or even strings. Most objects don't support addition. It is totally unreasonable to assume that dict addition will add values by default when by default, objects cannot be added.
how obvious is the conclusion that dict performs a shallow merge and resolves conflicting keys by giving precedence to the r.h.s. operand?
About as obvious that update performs a shallow merge and resolves duplicate keys by giving precedence to the last seen value. -- Steven
Steven D'Aprano wrote:
On Sun, Oct 20, 2019 at 11:29:54PM -0000, Dominik Vilsmeier wrote:
The question is, why would someone who has experience with adding counters but never felt the need to add dicts, assume that this behavior is specialized in Counter and not inherited by dict. I think you might mean inherited from dict? dict doesn't inherit Counter's behaviour because Counter is the subclass and dict the parent class.
Yes, sorry for the confusion, I meant "inherited from". One of the occasional non-native speaker issues :-)
Maybe at some point they'll encounter a scenario where they need to recursive-merge (the Counter style) two dicts and then they might assume that they just need to add the dicts Okay. So what? If they do this, it will be a mistake. Programmers make mistakes thousands of times a day, it is neither our responsibility nor within our power to prevent them all. Programmer error is not a good reason to reject a feature.
But it is the responsibility to assists programmers and help them make as few errors as possible by providing a clear and unambiguous syntax. If a specific syntax feature is ambiguous in its meaning, it's more likely to be an attractor of errors.
since they're familiar with this behavior from Counter and Counter subclasses dict so it's reasonable to assume this behavior is inherited. No it isn't reasonable. Counters are designed to count. Their values are supposed to be ints, usually positive ints. dicts are general key:value stores where the values can be any kind of object at all, not just numbers or even strings. Most objects don't support addition. It is totally unreasonable to assume that dict addition will add values by default when by default, objects cannot be added.
I fully agree to that. But someone working with dicts which store floats or lists might be tempted to assume that `d1 + d2` means "add the values" (especially if they're reading the code). If in that specific context it's perfectly fine (and maybe even reasonable) to add dict values it is more difficult to neglect that assumption (unless they're already familiar with the syntax). Yes, it's the programmers responsibility to be aware of what specific syntax does, but the language should assist as much as possible.
how obvious is the conclusion that dict performs a shallow merge and resolves conflicting keys by giving precedence to the r.h.s. operand? About as obvious that update performs a shallow merge and resolves duplicate keys by giving precedence to the last seen value.
Only if you know that "+" means "update" in that specific context. Otherwise one could even think that there are already ways to copy-merge two dicts, so why would they introduce new syntax for that, so "+" must be meaning something else (possibly the complementary, preserving l.h.s. values).
I think this PEP doesn't include one big disadvantage of the + operator. If we use + for dict merging, set doesn't support + looks strange and inconsistent. And if we add + to set too, set has three way (method, |, and +) to do merging. Then, all builtin container classes support +. It looks + is the common way to merge two containers in some way. Shouldn't we add it to abc? So I think we shouldn't focus just adding `+` to dict. It comes with huge side effect. We should think about general API design of the (esp. builtin) containers. Regards, -- Inada Naoki <songofacandy@gmail.com>
On 20/10/2019 05:52, Inada Naoki wrote:
I think this PEP doesn't include one big disadvantage of the + operator.
If we use + for dict merging, set doesn't support + looks strange and inconsistent.
As I've said before, set already looks strange and inconsistent to me. I have some hopes that after this discussion I will remember that set union is spelt "|", but thus far I've had to look it up every time. -- Rhodri James *-* Kynesim Ltd
https://www.python.org/dev/peps/pep-0584/#use-a-merged-method-instead-of-an-... In this section, unbound method form is introduced first. But unbound method is just an option. It is not so important. The method form is the key part of this proposal. So please introduce (bound) method first, or even remove unbound method from the PEP. Regards, -- Inada Naoki <songofacandy@gmail.com>
I have lots of code of this kind
e = d1.copy(); e.update(d2)
so the addition operator would be most welcome. I congratulate to this PEP! In fact, I would have wished, and in early programming had assumed, the `update` method would return the updated array rather than modifying the existing one, more along the philosophy of debugging-friendly functional programming. (Some NumPy methods are also not quite consistently semantically transparent to me in that regard.) An `dict.updated` method would have been a good explicit alternative, but with just the extra `d` at the end may be too error-prone. Yes, overloading can be confusing unless one realises that it is just a short form of calling the `__add__` method in this case. It is non-commutative for this operation, which can be a source of confusion for novices (same as multiplication for matrices, operators in QM, quaternions, ...) and, more trivially, similar to addition of ordered structures such as strings and lists whether not being commutative is obvious; just for unordered structures it is less obvious at first; dictionaries are different again. over all: +1 On Thu, 17 Oct 2019 at 16:37, Brandt Bucher <brandtbucher@gmail.com> wrote:
At long last, Steven D'Aprano and I have pushed a second draft of PEP 584 (dictionary addition):
https://www.python.org/dev/peps/pep-0584/
The accompanying reference implementation is on GitHub:
https://github.com/brandtbucher/cpython/tree/addiction
This new draft incorporates much of the feedback that we received during the first round of debate here on python-ideas. Most notably, the difference operators (-/-=) have been dropped from the proposal, and the implementations have been updated to use "new = self.copy(); new.update(other)" semantics, rather than "new = type(self)(); new.update(self); new.update(other)" as proposed before. It also includes more background information and summaries of major objections (with rebuttals).
Please let us know what you think – we'd love to hear any *new* feedback that hasn't yet been addressed in the PEP or the related discussions it links to! We plan on updating the PEP at least once more before review.
Thanks!
Brandt _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/W2FCSC... Code of Conduct: http://python.org/psf/codeofconduct/
participants (29)
-
Alexander Heger
-
Anders Hovmöller
-
Andrew Barnert
-
Brandt Bucher
-
Brian Skinn
-
brian.skinn@gmail.com
-
Chris Angelico
-
Christopher Barker
-
Dan Sommers
-
David Mertz
-
Dominik Vilsmeier
-
Ethan Furman
-
Guido van Rossum
-
Inada Naoki
-
Jan Greis
-
Josh Rosenberg
-
Meitham Jamaa
-
Mike Miller
-
MRAB
-
Neil Girdhar
-
Paul Moore
-
Rhodri James
-
Richard Musil
-
Ricky Teachey
-
Rob Cliffe
-
Sebastian Kreft
-
Stephen J. Turnbull
-
Steve Jorgensen
-
Steven D'Aprano