dict.merge(d1, d2, ...) (Counter proposal for PEP 584)
I think some people in favor of PEP 584 just want single expression for merging dicts without in-place update. But I feel it's abuse of operator overload. I think functions and methods are better than operator unless the operator has good math metaphor, or very frequently used as concatenate strings. This is why function and methods are better: * Easy to search. * Name can describe it's behavior better than abused operator. * Simpler lookup behavior. (e.g. subclass and __iadd__) Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope). * d = d1.merge(d2) # d = d1.copy(); d.update(d2) * d = d1.merge(d2, d3) # d = d1.copy(); d.update(d2); d2.update(d3) * d = d1.merge(iter_of_pairs) * d = d1.merge(key=value) ## Merits of dict.merge() over operator + * Easy to Google (e.g. "python dict merge"). * Easy to help(dict.merge). (or dict.merge? in IPython) * No inefficiency of d1+d2+d3+...+dN, or sum(list_of_many_dicts) * Type of returned value is always same to d1.copy(). No issubclass, no __iadd__. ## Why not dict.updated()? sorted() is a function so it looks different from L.sort() But d.updated() is very similar to d.update() for human eyes. ## How about d1 - d2? If it is really useful, it can be implemented as method too. dict.discard(sequence_of_keys) Regards, -- INADA Naoki <songofacandy@gmail.com>
On Tue, Mar 5, 2019 at 6:40 PM INADA Naoki <songofacandy@gmail.com> wrote:
This is why function and methods are better:
* Easy to search.
## Merits of dict.merge() over operator +
* Easy to Google (e.g. "python dict merge").
This keeps getting thrown around. It's simply not true. https://www.google.com/search?q=%7B**d1%2C+**d2%7D First hit when I do that search is Stack Overflow: https://stackoverflow.com/questions/2255878/what-does-mean-in-the-expression... which, while it's not specifically about that exact syntax, does mention it in the comments on the question. Symbols ARE searchable. In fact, adding the word "python" to the beginning of that search produces a number of very useful hits, including a Reddit thread on combining dictionaries, and PEP 584 itself. Please can people actually test these lines of argument before reiterating them? ChrisA
On Tue, Mar 5, 2019 at 5:23 PM Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Mar 5, 2019 at 6:40 PM INADA Naoki <songofacandy@gmail.com> wrote:
This is why function and methods are better:
* Easy to search.
## Merits of dict.merge() over operator +
* Easy to Google (e.g. "python dict merge").
This keeps getting thrown around. It's simply not true.
https://www.google.com/search?q=%7B**d1%2C+**d2%7D
First hit when I do that search is Stack Overflow:
https://stackoverflow.com/questions/2255878/what-does-mean-in-the-expression...
which, while it's not specifically about that exact syntax, does mention it in the comments on the question. Symbols ARE searchable. In fact, adding the word "python" to the beginning of that search produces a number of very useful hits, including a Reddit thread on combining dictionaries, and PEP 584 itself.
Please can people actually test these lines of argument before reiterating them?
ChrisA
I'm surprised {**d1, **d2} is searchable. But in my proposal, I compared with one character operator `+`. I switched my browser as English and Googled "python str +" https://www.google.com/search?q=python+str+%2B&oq=python+str+%2B As far as I can see, top result is https://docs.python.org/2/library/string.html When I search "+" in the page, it's difficult to find concat string. I tried Google "python set union" and "python set |" too. "union" is much easier to reach the answer. So I don't think "name is easier to Google than symbol" is a fake or FUD. Regards, -- INADA Naoki <songofacandy@gmail.com>
On Mon, Mar 4, 2019 at 11:41 PM INADA Naoki <songofacandy@gmail.com> wrote:
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2) * d = d1.merge(d2, d3) # d = d1.copy(); d.update(d2); d2.update(d3) * d = d1.merge(iter_of_pairs) * d = d1.merge(key=value)
Another similar option would be to extend the dict constructor to allow: d = dict(d1, d2, d3, ...) -n -- Nathaniel J. Smith -- https://vorpus.org
On Tue, Mar 5, 2019 at 5:50 PM Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Mar 4, 2019 at 11:41 PM INADA Naoki <songofacandy@gmail.com> wrote:
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2) * d = d1.merge(d2, d3) # d = d1.copy(); d.update(d2); d2.update(d3) * d = d1.merge(iter_of_pairs) * d = d1.merge(key=value)
Another similar option would be to extend the dict constructor to allow: d = dict(d1, d2, d3, ...)
-n
-- Nathaniel J. Smith -- https://vorpus.org
Yes, it's an option too. One obvious merit of d.merge(...) is it returns same type of d. `type(d1)(d1, d2)` looks ugly. But people just want dict instead of some subtype of dict. This merit is not so important. I'm bit nervous about adding much overload to constructor. That's main reason why I proposed method instead of constructor. Regards, -- INADA Naoki <songofacandy@gmail.com>
On Tue, Mar 05, 2019 at 06:04:40PM +0900, INADA Naoki wrote: [...]
One obvious merit of d.merge(...) is it returns same type of d. `type(d1)(d1, d2)` looks ugly.
But people just want dict instead of some subtype of dict. This merit is not so important.
Not to me! It *is* important to me. I want builtins to honour their subclasses. It is probably too late to change existing behaviour, but my proposal specifies that subclasses are honoured. -- Steven
On Tue, Mar 5, 2019 at 7:59 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Mar 05, 2019 at 06:04:40PM +0900, INADA Naoki wrote: [...]
One obvious merit of d.merge(...) is it returns same type of d. `type(d1)(d1, d2)` looks ugly.
But people just want dict instead of some subtype of dict. This merit is not so important.
Not to me! It *is* important to me.
I'm sorry, I missed "most".
I want builtins to honour their subclasses. It is probably too late to change existing behaviour, but my proposal specifies that subclasses are honoured.
Then my proposal `d1.merge(d2)` is much better than alternative dict(d1, d2) for you. -- Inada Naoki <songofacandy@gmail.com>
I agree so much on your opinion that I was just to create a topic about this if you didn't. I also propose here a small modification to make it more general which adds an argument `how` (name to be discussed), telling how to merge the dicts, as many have pointed out that there could be different ways to merge dicts. So things would be like def addition_merge(key, values, exists): """ :param key: the key to merge :param values: values of dicts to merge indexed at `key` :param exists: whether each dict contains `key` """ if any(exists): return True, sum([value for exist, value in zip(exists, values) if exist]) else: return False d1.merge(d2, d3, ..., how=addition_merge) We could even have def discard(key, values, exists): return not any(exists[1:]), values[0] d1.merge(d2, how=discard) which does the same thing as proposed `d1-d2`. This would make things like d = d1.merge(iter_of_pairs) d = d1.merge(key=value) not working, but people could easily wrap a `dict()` over the iterator or key-value stuff and attach no complication. At 2019-03-05 15:39:40, "INADA Naoki" <songofacandy@gmail.com> wrote:
I think some people in favor of PEP 584 just want single expression for merging dicts without in-place update.
But I feel it's abuse of operator overload. I think functions and methods are better than operator unless the operator has good math metaphor, or very frequently used as concatenate strings.
This is why function and methods are better:
* Easy to search. * Name can describe it's behavior better than abused operator. * Simpler lookup behavior. (e.g. subclass and __iadd__)
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2) * d = d1.merge(d2, d3) # d = d1.copy(); d.update(d2); d2.update(d3) * d = d1.merge(iter_of_pairs) * d = d1.merge(key=value)
## Merits of dict.merge() over operator +
* Easy to Google (e.g. "python dict merge"). * Easy to help(dict.merge). (or dict.merge? in IPython) * No inefficiency of d1+d2+d3+...+dN, or sum(list_of_many_dicts) * Type of returned value is always same to d1.copy(). No issubclass, no __iadd__.
## Why not dict.updated()?
sorted() is a function so it looks different from L.sort() But d.updated() is very similar to d.update() for human eyes.
## How about d1 - d2?
If it is really useful, it can be implemented as method too.
dict.discard(sequence_of_keys)
Regards, -- INADA Naoki <songofacandy@gmail.com> _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Python C API has PyDict_Merge (https://docs.python.org/3/c-api/dict.html#c.PyDict_Merge) function which has different behavior than the proposed Python level method (doesn't copy but merge in-place). This is a red flag for me. On Tue, Mar 5, 2019 at 12:24 PM fhsxfhsx <fhsxfhsx@126.com> wrote:
I agree so much on your opinion that I was just to create a topic about this if you didn't. I also propose here a small modification to make it more general which adds an argument `how` (name to be discussed), telling how to merge the dicts, as many have pointed out that there could be different ways to merge dicts. So things would be like
def addition_merge(key, values, exists): """ :param key: the key to merge :param values: values of dicts to merge indexed at `key` :param exists: whether each dict contains `key` """ if any(exists): return True, sum([value for exist, value in zip(exists, values) if exist]) else: return False d1.merge(d2, d3, ..., how=addition_merge)
We could even have
def discard(key, values, exists): return not any(exists[1:]), values[0] d1.merge(d2, how=discard)
which does the same thing as proposed `d1-d2`.
This would make things like d = d1.merge(iter_of_pairs) d = d1.merge(key=value) not working, but people could easily wrap a `dict()` over the iterator or key-value stuff and attach no complication.
At 2019-03-05 15:39:40, "INADA Naoki" <songofacandy@gmail.com> wrote:
I think some people in favor of PEP 584 just want single expression for merging dicts without in-place update.
But I feel it's abuse of operator overload. I think functions and methods are better than operator unless the operator has good math metaphor, or very frequently used as concatenate strings.
This is why function and methods are better:
* Easy to search. * Name can describe it's behavior better than abused operator. * Simpler lookup behavior. (e.g. subclass and __iadd__)
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2) * d = d1.merge(d2, d3) # d = d1.copy(); d.update(d2); d2.update(d3) * d = d1.merge(iter_of_pairs) * d = d1.merge(key=value)
## Merits of dict.merge() over operator +
* Easy to Google (e.g. "python dict merge"). * Easy to help(dict.merge). (or dict.merge? in IPython) * No inefficiency of d1+d2+d3+...+dN, or sum(list_of_many_dicts) * Type of returned value is always same to d1.copy(). No issubclass, no __iadd__.
## Why not dict.updated()?
sorted() is a function so it looks different from L.sort() But d.updated() is very similar to d.update() for human eyes.
## How about d1 - d2?
If it is really useful, it can be implemented as method too.
dict.discard(sequence_of_keys)
Regards, -- INADA Naoki <songofacandy@gmail.com> _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Thanks, Andrew Svetlov
On Tue, 5 Mar 2019 16:39:40 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
I think some people in favor of PEP 584 just want single expression for merging dicts without in-place update.
But I feel it's abuse of operator overload. I think functions and methods are better than operator unless the operator has good math metaphor, or very frequently used as concatenate strings.
This is why function and methods are better:
* Easy to search. * Name can describe it's behavior better than abused operator. * Simpler lookup behavior. (e.g. subclass and __iadd__)
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2)
One should also be able to write `d = dict.merge(d1, d2, ...)` If dict merging is important enough to get a new spelling, then I think this proposal is the best: explicit, unambiguous, immediately understandable and easy to remember. Regards Antoine.
On Thu, Mar 21, 2019 at 7:45 AM Antoine Pitrou <solipsis@pitrou.net> wrote:
On Tue, 5 Mar 2019 16:39:40 +0900 INADA Naoki <songofacandy@gmail.com> wrote:
I think some people in favor of PEP 584 just want single expression for merging dicts without in-place update.
But I feel it's abuse of operator overload. I think functions and methods are better than operator unless the operator has good math metaphor, or very frequently used as concatenate strings.
This is why function and methods are better:
* Easy to search. * Name can describe it's behavior better than abused operator. * Simpler lookup behavior. (e.g. subclass and __iadd__)
Then, I propose `dict.merge` method. It is outer-place version of `dict.update`, but accepts multiple dicts. (dict.update() can be updated to accept multiple dicts, but it's not out of scope).
* d = d1.merge(d2) # d = d1.copy(); d.update(d2)
One should also be able to write `d = dict.merge(d1, d2, ...)`
If dict merging is important enough to get a new spelling, then I think this proposal is the best: explicit, unambiguous, immediately understandable and easy to remember.
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1. Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy. -- --Guido van Rossum (python.org/~guido)
On Thu, Mar 21, 2019 at 09:11:18AM -0700, Guido van Rossum <guido@python.org> wrote:
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1.
Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy.
Then shouldn't it be a function (not a method)? dictutils.merge()?
--Guido van Rossum (python.org/~guido)
Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
On Thu, Mar 21, 2019 at 09:11:18AM -0700, Guido van Rossum wrote:
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1.
Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy.
How about dict.merged(*args, **kw)? Or dict.updated()? That would eliminate some of the difficulties with an operator, such as the difference between + which requires both operands to be a dict but += which can take any mapping or (key,value) iterable. -- Steven
Steven D'Aprano schrieb am 21.03.19 um 17:21:
On Thu, Mar 21, 2019 at 09:11:18AM -0700, Guido van Rossum wrote:
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1.
Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy.
How about dict.merged(*args, **kw)? Or dict.updated()?
And then users would accidentally type d.updated(items) and lack the tests to detect that this didn't do anything (except wasting some time and memory). Stefan
On Thu, Mar 21, 2019 at 1:54 PM Stefan Behnel <stefan_ml@behnel.de> wrote:
Steven D'Aprano schrieb am 21.03.19 um 17:21:
On Thu, Mar 21, 2019 at 09:11:18AM -0700, Guido van Rossum wrote:
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1.
Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy.
How about dict.merged(*args, **kw)? Or dict.updated()?
And then users would accidentally type
d.updated(items)
and lack the tests to detect that this didn't do anything (except wasting some time and memory).
Stefan
Generally when I call a method named with a verb on an instance of something mutable, I expect it to do something on that instance and return None. So merged() or updated() feels more like a built-in or a function to import from somewhere, akin to sorted(). Perhaps dict.union(d2) could be considered? Three points in favor: 1) Not a verb, therefore makes it clearer that it returns something new. 2) Not confusable with existing dict methods. 3) It matches the name and behavior of set.union (modulo value conflicts), so will be easier to grok.
On Fri, Mar 22, 2019 at 1:21 AM Steven D'Aprano <steve@pearwood.info> wrote:
How about dict.merged(*args, **kw)? Or dict.updated()?
+1 on "merged". I feel the word "update" indicating mutating, and it's difficult to distinguish between "update" and "updated".
That would eliminate some of the difficulties with an operator, such as the difference between + which requires both operands to be a dict but += which can take any mapping or (key,value) iterable.
-- Steven
-- Inada Naoki <songofacandy@gmail.com>
On 3/21/2019 12:11 PM, Guido van Rossum wrote:
On Thu, Mar 21, 2019 at 7:45 AM Antoine Pitrou
One should also be able to write `d = dict.merge(d1, d2, ...)`
If dict merging is important enough to get a new spelling, then I think this proposal is the best: explicit, unambiguous, immediately understandable and easy to remember.
I don't find it easy to understand or remember that d1.update(d2) modifies d1 in place, while d1.merge(d2) first copies d1.
Maybe the name can indicate the copying stronger? Like we did with sorting: l.sort() sorts in-place, while sorted(l) returns a sorted copy.
I counted what I believe to be 10 instances of copy-update in the top level of /lib. Do either of you consider this to be enough that any addition would be worthwhile. There are 3 in idlelib that I plan to replace with {**a, **b} and be done with the issue. I did not check any other packages. -- Terry Jan Reedy
On Thu, Mar 21, 2019 at 09:36:20PM -0400, Terry Reedy wrote:
I counted what I believe to be 10 instances of copy-update in the top level of /lib. Do either of you consider this to be enough that any addition would be worthwhile.
I think you're referring to Guido and Antoine? But for what it's worth, I think that's a good indication that there are uses for a merge operator.
There are 3 in idlelib that I plan to replace with {**a, **b} and be done with the issue. I did not check any other packages.
If a+b already worked for dicts, would you still prefer {**a, **b}? How about if it were spelled a|b? -- Steven
On 3/22/2019 12:53 AM, Steven D'Aprano wrote:
On Thu, Mar 21, 2019 at 09:36:20PM -0400, Terry Reedy wrote:
I counted what I believe to be 10 instances of copy-update in the top level of /lib. Do either of you consider this to be enough that any addition would be worthwhile.
I think you're referring to Guido and Antoine?
Yes, those were the two (core-devs) I quoted, and perhaps had missed my post, while you already thanked me for collecting some date.
But for what it's worth, I think that's a good indication that there are uses for a merge operator.
Some, yes. Enough for new syntax? What is a reasonable standard? Are there existing syntax features so sparsely used? What is the bar for something that adds no new function, but saves 6 chars and is easier to understand for at least some? In the past, 'Would this be used in the stdlib?' has been asked of feature proposals. But I never paid attention past == 0 or > 0. When Guido approved ':=', what threashhold of usefulness did he use? How many uses of ':=' does he anticipate, or consider enough to justify the addition?
There are 3 in idlelib that I plan to replace with {**a, **b} and be done with the issue. I did not check any other packages.
If a+b already worked for dicts, would you still prefer {**a, **b}?
Example: {**sys.modules, **globals()} Aside from the fact that I can patch *and* backport to 3.7 *now*, I think so. The latter clearly (to me) maps mappings to a dict.
How about if it were spelled a|b?
As in sys.modules | globals() or (sys.modules | globals())? Closer. -- Terry Jan Reedy
On Fri, 22 Mar 2019 at 07:46, Terry Reedy <tjreedy@udel.edu> wrote:
On 3/22/2019 12:53 AM, Steven D'Aprano wrote:
If a+b already worked for dicts, would you still prefer {**a, **b}?
Example: {**sys.modules, **globals()}
Aside from the fact that I can patch *and* backport to 3.7 *now*, I think so. The latter clearly (to me) maps mappings to a dict.
How about if it were spelled a|b?
As in sys.modules | globals() or (sys.modules | globals())? Closer.
Adding a comment here because it's new information (to me, about my subjective preferences, at least). I accept that it's "just" more comment on the whole point about what people subjectively prefer, but at some point the *amount* of subjective preference has to be considered, not everything can be decided purely on objective grounds, so hopefully it's still a useful data point. This is probably the first example of "real world" code written using {**d1, **d2} notation alongside d1+d2 and d1|d2 notation that has caught my attention (I've been skimming, I may have missed some). And I have to say that I find {**d1, **d2} (when used with real values rather than d1 and d2) *far* more obvious in context than either of the operator notations. I wouldn't have expected that - my intuition was that {**d1, **d2} is too punctuation-heavy and "perlish". But surprisingly that's not the case at all. If I ever needed side effect free dictionary merging as an expression, I'd now definitely prefer {**d1, **d2} to any operator form. Paul
I think that's a good indication that there are uses for a merge operator.
Some, yes. Enough for new syntax?
Let’s be clear here — this would not be new syntax — the operator (s) already exist and are commonly used and overloaded already. This would be a minor change to the dictionary class (and maybe the Mapping ABC), not a change to the language. Are
there existing syntax features so sparsely used?
I wonder how often + is used with lists in the stdlib... What is the bar for
something that adds no new function, but saves 6 chars and is easier to understand for at least some?
The “height of the bar” depends not just on how it would be used, but by how disruptive it is. As this is not nearly as disruptive as, say :=, I think the bar is pretty low. But others seem to think it would add great confusion, which would raise the bar a lot. By the way, if it isn’t used much, that also means it wouldn’t be very disruptive. :-) I’m coming down on the side of “not worth the argument” -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
23.03.19 18:24, Christopher Barker пише:
I wonder how often + is used with lists in the stdlib...
Searching for "+ [" shows that even concatenating with the string display and comprehensions is several times more common that merging dicts. And there should be cases not covered by this simple search, and concatenating of tuples and other sequences. Also, using + for sequences is a generalization of using it for strings and bytes objects, which are even more common.
participants (17)
-
Andrew Svetlov
-
Antoine Pitrou
-
Chris Angelico
-
Christopher Barker
-
fhsxfhsx
-
Guido van Rossum
-
Inada Naoki
-
INADA Naoki
-
Jeroen Demeyer
-
Jonathan Goble
-
Nathaniel Smith
-
Oleg Broytman
-
Paul Moore
-
Serhiy Storchaka
-
Stefan Behnel
-
Steven D'Aprano
-
Terry Reedy