Dict joining using + and +=
On 27/02/2019 16:25, João Matos wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b) -- Rhodri James *-* Kynesim Ltd
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 27/02/2019 16:25, João Matos wrote:
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b)
This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`. -- --Guido van Rossum (python.org/~guido)
On Wed, Feb 27, 2019 at 09:05:20AM -0800, Guido van Rossum <guido@python.org> wrote:
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 27/02/2019 16:25, Jo??o Matos wrote:
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b)
This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
That is, ``d1 + d2`` is:: d = d1.copy() d.update(d2) return d
-- --Guido van Rossum (python.org/~guido)
Oleg. -- Oleg Broytman https://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case? ~ George On Wed, Feb 27, 2019 at 10:06 AM Guido van Rossum <guido@python.org> wrote:
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 27/02/2019 16:25, João Matos wrote:
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b)
This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Feb 27, 2019 at 9:34 AM George Castillo <gmcastil@gmail.com> wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when
there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Yes there are. 'a' + 'b' is not the same as 'b' + 'a'. For non-numbers we only require + to be associative, i.e. a + b + c == (a + b) + c == a + (b + c). That is satisfied for this proposal. -- --Guido van Rossum (python.org/~guido)
On 2019-02-27 17:37, Guido van Rossum wrote:
On Wed, Feb 27, 2019 at 9:34 AM George Castillo <gmcastil@gmail.com <mailto:gmcastil@gmail.com>> wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Yes there are. 'a' + 'b' is not the same as 'b' + 'a'.
For non-numbers we only require + to be associative, i.e. a + b + c == (a + b) + c == a + (b + c).
That is satisfied for this proposal. Are there any advantages of using '+' over '|'?
On 2019-03-02 22:02, francismb wrote:
On 2/27/19 7:14 PM, MRAB wrote:
Are there any advantages of using '+' over '|'? or for e.g. '<=' (d1 <= d2) over '+' (d1 + d2)
'<=' is for comparison, less-than-or-equal (in the case of sets, subset, which is sort of the same kind of thing). Using it for anything else in Python would be too confusing.
On 3/2/19 11:11 PM, MRAB wrote:
'<=' is for comparison, less-than-or-equal (in the case of sets, subset, which is sort of the same kind of thing). Using it for anything else in Python would be too confusing. Understandable, so the the proposed (meaning) overloading for <= is also too much/unclear.
Are there any advantages of using '+' over '|'? or '<-' (d1 <- d2) meaning merge priority (overriding policy for equal keys) on the right dict, and may be '->' (d1 -> d2) merge priority on
On 2/27/19 7:14 PM, MRAB wrote: the left dict over '+' (d1 + d2) ? E.g.:
d1 = {'a':1, 'b':1 } d2 = {'a':2 } d3 = d1 -> d2 d3 {'a':1, 'b':1 }
d1 = {'a':1, 'b':1 } d2 = {'a':2 } d3 = d1 <- d2 d3 {'a':2, 'b':1 }
Regards, --francis
On Wed, Feb 27, 2019 at 6:35 PM George Castillo <gmcastil@gmail.com> wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Sure:
a = "A" b = "B" a + b == b + a False
On Wed, Feb 27, 2019 at 10:06 AM Guido van Rossum <guido@python.org> wrote:
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 27/02/2019 16:25, João Matos wrote:
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b)
This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
"foo" + "bar" != "bar" + "foo" On Wed, Feb 27, 2019, 12:35 PM George Castillo <gmcastil@gmail.com> wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when
there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
~ George
On Wed, Feb 27, 2019 at 10:06 AM Guido van Rossum <guido@python.org> wrote:
On Wed, Feb 27, 2019 at 8:50 AM Rhodri James <rhodri@kynesim.co.uk> wrote:
On 27/02/2019 16:25, João Matos wrote:
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
While I don't object to the idea of concatenating dictionaries, I feel obliged to point out that this last is currently spelled dict_a.update(dict_b)
This is likely to be controversial. But I like the idea. After all, we have `list.extend(x)` ~~ `list += x`. The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Feb 27, 2019 at 10:34:43AM -0700, George Castillo wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Strings, bytes, lists, tuples. In this case, I wouldn't call it dict addition, I would call it a union operator. That suggests that maybe we match sets and use | for union. That also suggests d1 & d2 for the intersection between two dicts, but which value should win? More useful than intersection is, I think, dict subtraction: d1 - d2 being a new dict with the keys/values from d1 which aren't in d2. -- Steven
OK, you're it. Please write a PEP for this. On Wed, Feb 27, 2019 at 3:53 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 27, 2019 at 10:34:43AM -0700, George Castillo wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2`
when
there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Strings, bytes, lists, tuples.
In this case, I wouldn't call it dict addition, I would call it a union operator. That suggests that maybe we match sets and use | for union.
That also suggests d1 & d2 for the intersection between two dicts, but which value should win?
More useful than intersection is, I think, dict subtraction: d1 - d2 being a new dict with the keys/values from d1 which aren't in d2.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Here is a working implementation of dictionary addition, for consideration with the PEP: https://bugs.python.org/issue36144 Brandt
On Feb 27, 2019, at 16:07, Guido van Rossum <guido@python.org> wrote:
OK, you're it. Please write a PEP for this.
On Wed, Feb 27, 2019 at 3:53 PM Steven D'Aprano <steve@pearwood.info> wrote: On Wed, Feb 27, 2019 at 10:34:43AM -0700, George Castillo wrote:
The key conundrum that needs to be solved is what to do for `d1 + d2` when there are overlapping keys. I propose to make d2 win in this case, which is what happens in `d1.update(d2)` anyways. If you want it the other way, simply write `d2 + d1`.
This would mean that addition, at least in this particular instance, is not a commutative operation. Are there other places in Python where this is the case?
Strings, bytes, lists, tuples.
In this case, I wouldn't call it dict addition, I would call it a union operator. That suggests that maybe we match sets and use | for union.
That also suggests d1 & d2 for the intersection between two dicts, but which value should win?
More useful than intersection is, I think, dict subtraction: d1 - d2 being a new dict with the keys/values from d1 which aren't in d2.
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I dislike the asymmetry with sets:
{1} | {2} {1, 2}
To me it makes sense that if + works for dict then it should for set too. / Anders
On 27 Feb 2019, at 17:25, João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
Best regards,
João Matos _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Feb 27, 2019 at 10:22 AM Anders Hovmöller <boxed@killingar.net> wrote:
I dislike the asymmetry with sets:
{1} | {2} {1, 2}
To me it makes sense that if + works for dict then it should for set too.
/ Anders
On 27 Feb 2019, at 17:25, João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values. https://docs.python.org/3/library/collections.html#collections.Counter.updat... Counter also uses +/__add__ for a similar behavior. >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
On Wed, Feb 27, 2019 at 10:42 AM Michael Selik <mike@selik.org> wrote:
On Wed, Feb 27, 2019 at 10:22 AM Anders Hovmöller <boxed@killingar.net> wrote:
I dislike the asymmetry with sets:
{1} | {2} {1, 2}
To me it makes sense that if + works for dict then it should for set too.
/ Anders
On 27 Feb 2019, at 17:25, João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values.
https://docs.python.org/3/library/collections.html#collections.Counter.updat...
Counter also uses +/__add__ for a similar behavior.
>>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3})
At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here. -- --Guido van Rossum (python.org/~guido)
On Wed, Feb 27, 2019 at 11:06 AM João Matos <jcrmatos@gmail.com> wrote:
Great. Because I don't program in any other language except Python, I can't make the PR (with the C code). Maybe someone who program in C can help?
First we need a PEP, and for a PEP you need a core dev interested in sponsoring the PEP. And I'm not it. Is there a core dev who is interested in sponsoring or co-authoring this PEP? -- --Guido van Rossum (python.org/~guido)
On 2/27/2019 2:08 PM, Guido van Rossum wrote:
On Wed, Feb 27, 2019 at 11:06 AM João Matos <jcrmatos@gmail.com <mailto:jcrmatos@gmail.com>> wrote:
Great. Because I don't program in any other language except Python, I can't make the PR (with the C code). Maybe someone who program in C can help?
First we need a PEP, and for a PEP you need a core dev interested in sponsoring the PEP. And I'm not it. Is there a core dev who is interested in sponsoring or co-authoring this PEP?
I'd help out. Eric
I’d like to try my hand at implementing this, if nobody else is interested. I should be able to have something up today. Brandt
On Feb 27, 2019, at 11:05, João Matos <jcrmatos@gmail.com> wrote:
Hello,
Great. Because I don't program in any other language except Python, I can't make the PR (with the C code). Maybe someone who program in C can help?
Best regards,
João Matos
On 27-02-2019 18:48, Guido van Rossum wrote:
On Wed, Feb 27, 2019 at 10:42 AM Michael Selik <mike@selik.org> wrote:
On Wed, Feb 27, 2019 at 10:22 AM Anders Hovmöller <boxed@killingar.net> wrote: I dislike the asymmetry with sets:
{1} | {2} {1, 2}
To me it makes sense that if + works for dict then it should for set too.
/ Anders
On 27 Feb 2019, at 17:25, João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values.
https://docs.python.org/3/library/collections.html#collections.Counter.updat...
Counter also uses +/__add__ for a similar behavior.
>>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3})
At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here.
-- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
27.02.19 20:48, Guido van Rossum пише:
On Wed, Feb 27, 2019 at 10:42 AM Michael Selik <mike@selik.org <mailto:mike@selik.org>> wrote > The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values.
https://docs.python.org/3/library/collections.html#collections.Counter.updat...
Counter also uses +/__add__ for a similar behavior.
>>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3})
At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here.
Counter uses + for a *different* behavior!
Counter(a=2) + Counter(a=3) Counter({'a': 5})
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
I agree with Storchaka here. The advantage of existing dict merge syntax is that it will cause an error if the object is not a dict or dict-like object, thus preventing people from doing bad things.
On Feb 28, 2019, at 2:16 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
27.02.19 20:48, Guido van Rossum пише:
On Wed, Feb 27, 2019 at 10:42 AM Michael Selik <mike@selik.org <mailto:mike@selik.org>> wrote > The dict subclass collections.Counter overrides the update method for adding values instead of overwriting values. https://docs.python.org/3/library/collections.html#collections.Counter.updat... Counter also uses +/__add__ for a similar behavior. >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms. Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here.
Counter uses + for a *different* behavior!
Counter(a=2) + Counter(a=3) Counter({'a': 5})
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Feb 28, 2019 at 07:40:25AM -0500, James Lu wrote:
I agree with Storchaka here. The advantage of existing dict merge syntax is that it will cause an error if the object is not a dict or dict-like object, thus preventing people from doing bad things.
What sort of "bad things" are you afraid of? -- Steven
Serhiy Storchaka wrote:
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings).
But that always returns a dict. A '+' operator could be implemented by other mapping types to return a mapping of the same type. -- Greg
28.02.19 23:19, Greg Ewing пише:
Serhiy Storchaka wrote:
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings).
But that always returns a dict. A '+' operator could be implemented by other mapping types to return a mapping of the same type.
And this opens a non-easy problem: how to create a mapping of the same type? Not all mappings, and even not all dict subclasses have a copying constructor.
On Thu, Feb 28, 2019 at 10:30 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
28.02.19 23:19, Greg Ewing пише:
Serhiy Storchaka wrote:
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings).
But that always returns a dict. A '+' operator could be implemented by other mapping types to return a mapping of the same type.
And this opens a non-easy problem: how to create a mapping of the same type? Not all mappings, and even not all dict subclasses have a copying constructor.
There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other). Looking at the code for Counter, its __iadd__ and __add__ behave subtly different than Counter.update(): __iadd__ and __add__ (and __radd__) drop values that are <= 0, while update() does not. That's all fine -- Counter is not bound by the exact same semantics as dict (starting with its update() method, which adds values rather than overwriting). Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter is highly non-obvious except if you've already encountered that pattern before, while d1+d2 is what anybody familiar with other Python collection types would guess or propose. And the default semantics for subclasses of dict that don't override these are settled with the "d = d1.copy(); d.update(d2)" equivalence. -- --Guido van Rossum (python.org/~guido)
On Mar 1, 2019, at 11:31 AM, Guido van Rossum <guido@python.org> wrote:
There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).
Usually, it's easy to add methods to classes without creating disruption, but ABCs are more problematic. If MutableMapping grows an __iadd__() method, what would that mean for existing classes that register as MutableMapping but don't already implement __iadd__? When "isinstance(m, MutableMapping)" returns True, is it a promise that the API is fully implemented? Is this something that mypy could would or should complain about?
Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter is highly non-obvious except if you've already encountered that pattern before
I concur. The latter is also an eyesore and almost certain to be a stumbling block when reading code. That said, I'm not sure we actually need a short-cut for "d=e.copy(); d.update(f)". Code like this comes-up for me perhaps once a year. Having a plus operator on dicts would likely save me five seconds per year. If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); d.update(h)", converting it to "d = e + f + g + h" would be a tempting but algorithmically poor thing to do (because the behavior is quadratic). Most likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result without incurring quadratic costs. Both of those are short and clear. Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+"). Raymond
On 3/2/19 8:14 PM, Raymond Hettinger wrote:
Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+"). +1, it's a good point. IMHO the proposed (meaning) overloading for + and += is too much/unclear. If the idea is to 'join' dicts why not to use "d.join(...here the other dicts ...)"
Regards, --francis
On Sat, 2 Mar 2019 at 19:15, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Mar 1, 2019, at 11:31 AM, Guido van Rossum <guido@python.org> wrote:
There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).
Usually, it's easy to add methods to classes without creating disruption, but ABCs are more problematic. If MutableMapping grows an __iadd__() method, what would that mean for existing classes that register as MutableMapping but don't already implement __iadd__? When "isinstance(m, MutableMapping)" returns True, is it a promise that the API is fully implemented? Is this something that mypy could would or should complain about?
Just to clarify the situation, currently Mapping and MutableMapping are not protocols from both runtime and mypy points of view. I.e. they don't have the structural __subclasshook__() (as e.g. Iterable), and are not declared as Protocol in typeshed. So to implement these (and be considered a subtype by mypy) one needs to explicitly subclass them (register() isn't supported by mypy). This means that adding a new method will not cause any problems here, since the new method will be non-abstract with a default implementation that calls update() (the same way as for MutableSequence). The only potential for confusion I see is if there is a class that de-facto implements current MutableMapping API and made a subclass (at runtime) of MutableMapping using register(). Then after we add __iadd__, users of that class might expect that __iadd__ is implemented, while it might be not. This is however OK I think, since register() is already non type safe. Also there is a simple way to find if there are any subclassses of MutableMapping in typeshed that don't have __iadd__: one can *try* declaring MutableMapping.__iadd__ as abstract, and mypy will error on all such subclasses. -- Ivan
On Sat, Mar 02, 2019 at 11:14:18AM -0800, Raymond Hettinger wrote:
If the existing code were in the form of "d=e.copy(); d.update(f); d.update(g); d.update(h)", converting it to "d = e + f + g + h" would be a tempting but algorithmically poor thing to do (because the behavior is quadratic).
I mention this in the PEP. Unlike strings, but like lists and tuples, I don't expect that this will be a problem in practice: - it's easy to put repeated string concatenation in a tight loop; it is harder to think of circumstances where one needs to concatenate lists or tuples, or merge dicts, in a tight loop; - it's easy to have situations where one is concatenating thousands of strings; its harder to imagine circumstances where one would be merging more than three or four dicts; - concatentation s1 + s2 + ... for strings, lists or tuples results in a new object of length equal to the sum of the lengths of each of the inputs, so the output is constantly growing; but merging dicts d1 + d2 + ... typically results in a smaller object of length equal to the number of unique keys.
Most likely, the right thing to do would be "d = ChainMap(e, f, g, h)" for a zero-copy solution or "d = dict(ChainMap(e, f, g, h))" to flatten the result without incurring quadratic costs. Both of those are short and clear.
And both result in the opposite behaviour of what you probably intended if you were trying to match e + f + g + h. Dict merging/updating operates on "last seen wins", but ChainMap is "first seen wins". To get the same behaviour, we have to write the dicts in opposite order compared to update, from most to least specific: # least specific to most specific prefs = site_defaults + user_defaults + document_prefs # most specific to least prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults)) To me, the later feels backwards: I'm applying document prefs first, and then trusting that the ChainMap doesn't overwrite them with the defaults. I know that's guaranteed behaviour, but every time I read it I'll feel the need to check :-)
Lastly, I'm still bugged by use of the + operator for replace-logic instead of additive-logic. With numbers and lists and Counters, the plus operator creates a new object where all the contents of each operand contribute to the result. With dicts, some of the contents for the left operand get thrown-away. This doesn't seem like addition to me (IIRC that is also why sets have "|" instead of "+").
I'm on the fence here. Addition seems to be the most popular operator (it often gets requested) but you might be right that this is more like a union operation than concatenation or addition operation. MRAB also suggested this earlier. One point in its favour is that + goes nicely with - but on the other hand, sets have | and - with no + and that isn't a problem. -- Steven
01.03.19 21:31, Guido van Rossum пише:
On Thu, Feb 28, 2019 at 10:30 PM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote: And this opens a non-easy problem: how to create a mapping of the same type? Not all mappings, and even not all dict subclasses have a copying constructor.
There's a compromise solution for this possible. We already do this for Sequence and MutableSequence: Sequence does *not* define __add__, but MutableSequence *does* define __iadd__, and the default implementation just calls self.update(other). I propose the same for Mapping (do nothing) and MutableMapping: make the default __iadd__ implementation call self.update(other).
This LGTM for mappings. But the problem with dict subclasses still exists. If use the copy() method for creating a copy, d1 + d2 will always return a dict (unless the plus operator or copy() are redefined in a subclass). If use the constructor of the left argument type, there will be problems with subclasses with non-compatible constructors (e.g. defaultdict).
Anyways, the main reason to prefer d1+d2 over {**d1, **d2} is that the latter is highly non-obvious except if you've already encountered that pattern before, while d1+d2 is what anybody familiar with other Python collection types would guess or propose. And the default semantics for subclasses of dict that don't override these are settled with the "d = d1.copy(); d.update(d2)" equivalence.
Dicts are not like lists or deques, or even sets. Iterating dicts produces keys, but not values. The "in" operator tests a key, but not a value. It is not that I like to add an operator for dict merging, but dicts are more like sets than sequences: they can not contain duplicated keys and the size of the result of merging two dicts can be less than the sum of their sizes. Using "|" looks more natural to me than using "+". We should look at discussions for using the "|" operator for sets, if the alternative of using "+" was considered, I think the same arguments for preferring "|" for sets are applicable now for dicts. But is merging two dicts a common enough problem that needs introducing an operator to solve it? I need to merge dicts maybe not more than one or two times by year, and I am fine with using the update() method. Perhaps {**d1, **d2} can be more appropriate in some cases, but I did not encounter such cases yet.
On Mon, Mar 4, 2019 at 10:29 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
It is not that I like to add an operator for dict merging, but dicts are more like sets than sequences: they can not contain duplicated keys and the size of the result of merging two dicts can be less than the sum of their sizes. Using "|" looks more natural to me than using "+". We should look at discussions for using the "|" operator for sets, if the alternative of using "+" was considered, I think the same arguments for preferring "|" for sets are applicable now for dicts.
I concur with Serhiy. While I don't like adding operator to dict, proposed +/- looks similar to set |/- than seq +/-. If we're going to add such set-like operations, operators can be: * dict & dict_or_set * dict - dict_or_set * dict | dict Especially, dict - set can be more useful than proposed dict - dict.
But is merging two dicts a common enough problem that needs introducing an operator to solve it? I need to merge dicts maybe not more than one or two times by year, and I am fine with using the update() method.
+1. Adding new method to builtin should have a high bar. Adding new operator to builtin should have a higher bar. Adding new syntax should have a highest bar. -- INADA Naoki <songofacandy@gmail.com>
On Mon, Mar 4, 2019, 8:30 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
But is merging two dicts a common enough problem that needs introducing an operator to solve it? I need to merge dicts maybe not more than one or two times by year, and I am fine with using the update() method. Perhaps {**d1, **d2} can be more appropriate in some cases, but I did not encounter such cases yet.
Like other folks in the thread, I also want to merge dicts three times per year. And every one of those times, itertools.ChainMap is the right way to do that non-destructively, and without copying.
On Mon, Mar 04, 2019 at 09:42:53AM -0500, David Mertz wrote:
On Mon, Mar 4, 2019, 8:30 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
But is merging two dicts a common enough problem that needs introducing an operator to solve it? I need to merge dicts maybe not more than one or two times by year, and I am fine with using the update() method. Perhaps {**d1, **d2} can be more appropriate in some cases, but I did not encounter such cases yet.
Like other folks in the thread, I also want to merge dicts three times per year.
I'm impressed that you have counted it with that level of accuracy. Is it on the same three days each year, or do they move about? *wink*
And every one of those times, itertools.ChainMap is the right way to do that non-destructively, and without copying.
Can you elaborate on why ChainMap is the right way to merge multiple dicts into a single, new dict? ChainMap also seems to implement the opposite behaviour to that usually desired: first value seen wins, instead of last: py> from collections import ChainMap py> cm = ChainMap({'a': 1}, {'b': 2}, {'a': 999}) py> cm ChainMap({'a': 1}, {'b': 2}, {'a': 999}) py> dict(cm) {'a': 1, 'b': 2} If you know ahead of time which order you want, you can simply reverse it: # prefs = site_defaults + user_defaults + document_prefs prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults)) but that seems a little awkward to me, and reads backwards. I'm used to thinking reading left-to-right, not right-to-left. ChainMap seems, to me, to be ideal for implementing "first wins" mappings, such as emulating nested scopes, but not so ideal for update/merge operations. -- Steven
On 3/4/19 10:44 AM, Steven D'Aprano wrote:
If you know ahead of time which order you want, you can simply reverse it:
# prefs = site_defaults + user_defaults + document_prefs prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
but that seems a little awkward to me, and reads backwards. I'm used to thinking reading left-to-right, not right-to-left.
I read that as use document preferences first, then user defaults, then site defautls, exactly as I'd explain the functionality to someone else. So maybe we're agreeing: if you think in terms of updating a dictionary of preferences, then maybe it reads backwards, but if you think of implementing features, then adding dictionaries of preferences reads backwards.
On Mon, Mar 04, 2019 at 11:56:54AM -0600, Dan Sommers wrote:
On 3/4/19 10:44 AM, Steven D'Aprano wrote:
If you know ahead of time which order you want, you can simply reverse it:
# prefs = site_defaults + user_defaults + document_prefs prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
but that seems a little awkward to me, and reads backwards. I'm used to thinking reading left-to-right, not right-to-left.
I read that as use document preferences first, then user defaults, then site defautls, exactly as I'd explain the functionality to someone else.
If you explained it to me like that, with the term "use", I'd think that the same feature would be done three times: once with document prefs, then with user defaults, then site defaults. Clearly that's not what you mean, so I'd then have to guess what you meant by "use", since you don't actually mean use. That would leave me trying to guess whether you meant that *site defaults* overrode document prefs or the other way. I don't like guessing, so I'd probably explicitly ask: "Wait, I'm confused, which wins? It sounds like site defaults wins, surely that's not what you meant."
So maybe we're agreeing: if you think in terms of updating a dictionary of preferences, then maybe it reads backwards, but if you think of implementing features, then adding dictionaries of preferences reads backwards.
Do you think "last seen wins" is backwards for dict.update() or for command line options? -- Steven
On 3/4/19 5:11 PM, Steven D'Aprano wrote:
On Mon, Mar 04, 2019 at 11:56:54AM -0600, Dan Sommers wrote:
On 3/4/19 10:44 AM, Steven D'Aprano wrote:
If you know ahead of time which order you want, you can simply reverse it:
# prefs = site_defaults + user_defaults + document_prefs prefs = dict(ChainMap(document_prefs, user_defaults, site_defaults))
but that seems a little awkward to me, and reads backwards. I'm used to thinking reading left-to-right, not right-to-left.
I read that as use document preferences first, then user defaults, then site defautls, exactly as I'd explain the functionality to someone else.
If you explained it to me like that, with the term "use", I'd think that the same feature would be done three times: once with document prefs, then with user defaults, then site defaults.
Clearly that's not what you mean, so I'd then have to guess what you meant by "use", since you don't actually mean use. That would leave me trying to guess whether you meant that *site defaults* overrode document prefs or the other way.
I don't like guessing, so I'd probably explicitly ask: "Wait, I'm confused, which wins? It sounds like site defaults wins, surely that's not what you meant."
You're right: "use" is the wrong word. Perhaps "prefer" is more appropriate. To answer the question of which wins: the first one in the list [document, user, site] that contains a given preference in question. Users don't see dictionary updates; they see collections of preferences in order of priority. Documentation is hard. :-) Sorry.
So maybe we're agreeing: if you think in terms of updating a dictionary of preferences, then maybe it reads backwards, but if you think of implementing features, then adding dictionaries of preferences reads backwards.
Do you think "last seen wins" is backwards for dict.update() or for command line options?
As a user, "last seen wins" is clearly superior for command line options. As a programmer, because object methods operate on their underlying object, it's pretty obvious that d1.update(d2) starts with d1 and applies the changes expressed in d2, which is effectively "last seen wins." If I resist the temptation to guess in the face of ambiguity, though, I don't think that d1 + d2 is any less ambiguous than a hypothetical dict_update(d1, d2) function. When I see a + operator, I certainly don't think of one operand or the other winning.
On Mon, Mar 4, 2019, 11:45 AM Steven D'Aprano <steve@pearwood.info> wrote:
Like other folks in the thread, I also want to merge dicts three times per year.
I'm impressed that you have counted it with that level of accuracy. Is it on the same three days each year, or do they move about? *wink*
To be respectful, I always merge dicts on Eid al-Fitr, Diwali, and Lent. I was speaking approximate since those do not appears line up with the same Gregorian year.
And every one of those times, itertools.ChainMap is the right way to do that non-destructively, and without copying.
Can you elaborate on why ChainMap is the right way to merge multiple dicts into a single, new dict?
Zero-copy.
ChainMap also seems to implement the opposite behaviour to that usually desired: first value seen wins, instead of last:
True, the semantics are different, but equivalent, to the proposed dict addition. I put the key I want to "win" first rather than last. If you know ahead of time which order you want, you can simply reverse it:
This seems nonsensical. If I write, at some future time, 'dict1+dict2+dict3' I need exactly as much to know "ahead of time" which keys I intend to win.
* Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts. * Regarding how to construct the new set in __add__, I now think this should be done like this: class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new AFAICT this will give the expected result for defaultdict -- it keeps the default factory from the left operand (i.e., self). * Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need. * Regarding possible anti-patterns that this might encourage, I'm not aware of problems around list + list, so this seems an unwarranted worry to me. -- --Guido van Rossum (python.org/~guido)
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <guido@python.org> wrote:
* Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
I feel like dict should be treated like sets with the |, &, and - operators since in mathematics a mapping is sometimes represented as a set of pairs with unique first elements. Therefore, I think the set metaphor is stronger.
* Regarding how to construct the new set in __add__, I now think this should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new
I like that, but it would be inefficient to do that for __sub__ since it would create elements that it might later delete. def __sub__(self, other): new = self.copy() for k in other: del new[k] return new is less efficient than def __sub__(self, other): return type(self)({k: v for k, v in self.items() if k not in other}) when copying v is expensive. Also, users would probably not expect values that don't end up being returned to be copied.
AFAICT this will give the expected result for defaultdict -- it keeps the default factory from the left operand (i.e., self).
* Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.
* Regarding possible anti-patterns that this might encourage, I'm not aware of problems around list + list, so this seems an unwarranted worry to me.
I agree with these points. Best, Neil
-- --Guido van Rossum (python.org/~guido)
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/zfHYRHMIAdM/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
--- You received this message because you are subscribed to a topic in the Google Groups "python-ideas" group. To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-ideas/zfHYRHMIAdM/unsubscribe. To unsubscribe from this group and all its topics, send an email to python-ideas+unsubscribe@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
On Mon, Mar 4, 2019 at 12:12 PM Neil Girdhar <mistersheik@gmail.com> wrote:
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <guido@python.org> wrote:
* Dicts are not like sets because the ordering operators (<, <=, >, >=)
are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
I feel like dict should be treated like sets with the |, &, and - operators since in mathematics a mapping is sometimes represented as a set of pairs with unique first elements. Therefore, I think the set metaphor is stronger.
That ship has long sailed.
* Regarding how to construct the new set in __add__, I now think this should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new
I like that, but it would be inefficient to do that for __sub__ since it would create elements that it might later delete.
def __sub__(self, other): new = self.copy() for k in other: del new[k] return new
is less efficient than
def __sub__(self, other): return type(self)({k: v for k, v in self.items() if k not in other})
when copying v is expensive. Also, users would probably not expect values that don't end up being returned to be copied.
No, the values won't be copied -- it is a shallow copy that only increfs the keys and values. -- --Guido van Rossum (python.org/~guido)
On Mon, Mar 4, 2019 at 3:22 PM Guido van Rossum <guido@python.org> wrote:
On Mon, Mar 4, 2019 at 12:12 PM Neil Girdhar <mistersheik@gmail.com> wrote:
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <guido@python.org> wrote:
* Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
I feel like dict should be treated like sets with the |, &, and - operators since in mathematics a mapping is sometimes represented as a set of pairs with unique first elements. Therefore, I think the set metaphor is stronger.
That ship has long sailed.
Maybe, but reading through the various replies, it seems that if you are adding "-" to be analogous to set difference, then the combination operator should be analogous to set union "|". And it also opens an opportunity to add set intersection "&". After all, how do you filter a dictionary to a set of keys?
d = {'some': 5, 'extra': 10, 'things': 55} d &= {'some', 'allowed', 'options'} d {'some': 5}
* Regarding how to construct the new set in __add__, I now think this should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new
I like that, but it would be inefficient to do that for __sub__ since it would create elements that it might later delete.
def __sub__(self, other): new = self.copy() for k in other: del new[k] return new
is less efficient than
def __sub__(self, other): return type(self)({k: v for k, v in self.items() if k not in other})
when copying v is expensive. Also, users would probably not expect values that don't end up being returned to be copied.
No, the values won't be copied -- it is a shallow copy that only increfs the keys and values.
Oh right, good point. Then your way is better since it would preserve any other data stored by the dict subclass.
-- --Guido van Rossum (python.org/~guido)
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set. On Mon, Mar 4, 2019 at 12:33 PM Neil Girdhar <mistersheik@gmail.com> wrote:
On Mon, Mar 4, 2019 at 3:22 PM Guido van Rossum <guido@python.org> wrote:
On Mon, Mar 4, 2019 at 12:12 PM Neil Girdhar <mistersheik@gmail.com>
wrote:
On Mon, Mar 4, 2019 at 2:26 PM Guido van Rossum <guido@python.org>
wrote:
* Dicts are not like sets because the ordering operators (<, <=, >,
=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
I feel like dict should be treated like sets with the |, &, and - operators since in mathematics a mapping is sometimes represented as a set of pairs with unique first elements. Therefore, I think the set metaphor is stronger.
That ship has long sailed.
Maybe, but reading through the various replies, it seems that if you are adding "-" to be analogous to set difference, then the combination operator should be analogous to set union "|". And it also opens an opportunity to add set intersection "&". After all, how do you filter a dictionary to a set of keys?
d = {'some': 5, 'extra': 10, 'things': 55} d &= {'some', 'allowed', 'options'} d {'some': 5}
* Regarding how to construct the new set in __add__, I now think this
should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to
override
new.update(other) return new
I like that, but it would be inefficient to do that for __sub__ since it would create elements that it might later delete.
def __sub__(self, other): new = self.copy() for k in other: del new[k] return new
is less efficient than
def __sub__(self, other): return type(self)({k: v for k, v in self.items() if k not in other})
when copying v is expensive. Also, users would probably not expect values that don't end up being returned to be copied.
No, the values won't be copied -- it is a shallow copy that only increfs the keys and values.
Oh right, good point. Then your way is better since it would preserve any other data stored by the dict subclass.
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <guido@python.org> wrote:
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.
+1 I think the "dicts are like more-featured" sets is a math-geek perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear. We need to be careful -- there are a lot more math geeks on this list than in the general Python coding population. Simply adding "+" is a non-critical nice to have, but it seems unlikely to really confuse anyone. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Mar 4, 2019 at 3:58 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <guido@python.org> wrote:
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.
I think that's unfortunate.
+1
I think the "dicts are like more-featured" sets is a math-geek perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear.
I'd say reddit has some pretty "common users", and they're having a discussion of this right now (https://www.reddit.com/r/Python/comments/ax4zzb/pep_584_add_and_operators_to...). The most popular comment is how it should be |. Anyway, I think that following the mathematical metaphors tends to make things more intuitive in the long run. Python is an adventure. You learn it for years and then it all makes sense. If dict uses +, yes, new users might find that sooner than |. However, when they learn set union, I think they will wonder why it's not consistent with dict union. The PEP's main justification for + is that it matches Counter, but counter is adding the values whereas | doesn't touch the values. I think it would be good to at least make a list of pros and cons of each proposed syntax.
We need to be careful -- there are a lot more math geeks on this list than in the general Python coding population.
Simply adding "+" is a non-critical nice to have, but it seems unlikely to really confuse anyone.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Mar 4, 2019 at 1:29 PM Neil Girdhar <mistersheik@gmail.com> wrote:
On Mon, Mar 4, 2019 at 3:58 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Mon, Mar 4, 2019 at 12:41 PM Guido van Rossum <guido@python.org>
Honestly I would rather withdraw the subtraction operators than reopen
wrote: the discussion about making dict more like set.
I think that's unfortunate.
+1
I think the "dicts are like more-featured" sets is a math-geek
perspective, and unlikely to make things more clear for the bulk of users. And may make it less clear.
I'd say reddit has some pretty "common users", and they're having a discussion of this right now ( https://www.reddit.com/r/Python/comments/ax4zzb/pep_584_add_and_operators_to... ). The most popular comment is how it should be |.
Anyway, I think that following the mathematical metaphors tends to make things more intuitive in the long run.
Only if you know the mathematical metaphors. ;)
Python is an adventure. You learn it for years and then it all makes sense. If dict uses +, yes, new users might find that sooner than |. However, when they learn set union, I think they will wonder why it's not consistent with dict union.
Not to me. I barely remember that | is supported for sets, but I sure know about + and lists (and strings, etc.) and I'm willing to bet the vast majority of folks are the some; addition is much more widely known than set theory.
The PEP's main justification for + is that it matches Counter, but counter is adding the values whereas | doesn't touch the values. I think it would be good to at least make a list of pros and cons of each proposed syntax.
I suspect Steven will add more details to a Rejected Ideas section.
We need to be careful -- there are a lot more math geeks on this list than in the general Python coding population.
Simply adding "+" is a non-critical nice to have, but it seems unlikely to really confuse anyone.
I agree with Chris. -Brett
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, 4 Mar 2019 at 20:42, Guido van Rossum <guido@python.org> wrote:
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.
I'm neutral on dict addition, but dict subtraction seemed an odd extension to the proposal. Using b in a - b solely for its keys, and ignoring its values, seems weird to me. Even if dict1 - dict2 were added to the language, I think I'd steer clear of it as being too obscure. I'm not going to get sucked into this debate, but I'd be happy to see the subtraction operator part of the proposal withdrawn. Paul
On Mon, Mar 04, 2019 at 09:34:34PM +0000, Paul Moore wrote:
On Mon, 4 Mar 2019 at 20:42, Guido van Rossum <guido@python.org> wrote:
Honestly I would rather withdraw the subtraction operators than reopen the discussion about making dict more like set.
As some people have repeatedly pointed out, we already have four ways to spell dict merging: - in-place dict.update; - copy, followed by update; - use a ChainMap; - the obscure new (**d1, ...} syntax. But if there's a good way to get dict difference apart from a manual loop or comprehension, I don't know it. So from my perspective, even though most of the attention has been on the merge operator, I'd rather keep the difference operator. As far as making dicts "more like set", I'm certainly not proposing that. The furthest I'd go is bow to the consensus if it happened to decide that | is a better choice than + (but that seems unlikely).
I'm neutral on dict addition, but dict subtraction seemed an odd extension to the proposal. Using b in a - b solely for its keys, and ignoring its values, seems weird to me.
The PEP current says that dict subtraction requires the right-hand operand to be a dict. That's the conservative choice that follows the example of list addition (it requires a list, not just any iterable) and avoids breaking changes to code that uses operator-overloading: mydict - some_object works if some_object overloads __rsub__. If dict.__sub__ was greedy in what it accepted, it could break such code. Better (in my opinion) to be less greedy by only allowing dicts. dict -= on the other hand can take any iterable of keys, as the right-hand operand isn't called. Oh, another thing the PEP should gain... a use-case for dict subtraction. Here's a few: (1) You have a pair of dicts D and E, and you want to update D with only the new keys from E: D.update(E - D) which I think is nicer than writing a manual loop: D.update({k:E[k] for k in (E.keys() - D.keys())}) # or D.update({k:v for k,v in E.items() if k not in D}) (This is a form of update with "first seen wins" instead of the usual "last seen wins".) (2) You have a dict D, and you want to unconditionally remove keys from a blacklist, e.g.: all_users = {'username': user, ...} allowed_users = all_users - banned_users (3) You have a dict, and you want to ensure there's no keys that you didn't expect: if (d := actual-expected): print('unexpected key:value pairs', d)
Even if dict1 - dict2 were added to the language, I think I'd steer clear of it as being too obscure.
Everything is obscure until people learn it and get used to it. -- Steven
Indeed the "obscure" argument should be thrown away. The `|` operator in sets seems to be evident for every one on this list but I would be curious to know how many people first got a TypeError doing set1 + set2 and then found set1 | set2 in the doc. Except for math geek the `|` is always something obscure.
Even if dict1 - dict2 were added to the language, I think I'd steer clear of it as being too obscure. Everything is obscure until people learn it and get used to it.
On Tue, Mar 5, 2019 at 6:42 PM Jimmy Girardet <ijkl@netc.fr> wrote:
Indeed the "obscure" argument should be thrown away.
The `|` operator in sets seems to be evident for every one on this list but I would be curious to know how many people first got a TypeError doing set1 + set2 and then found set1 | set2 in the doc.
Except for math geek the `|` is always something obscure.
Interesting point. In Japan, we learn set in high school, not in university. And I think it's good idea that people using `set` type learn about `set` in math. So I don't think "union" is not only for math geeks. But we use "A ∪ B" in math. `|` is borrowed from "bitwise OR" in C. And "bitwise" operators are for "geeks". Although I'm not in favor of adding `+` to set, it will be worth enough to add `+` to set too if it is added to dict for consistency. FWIW, Scala uses `++` for join all containers. Kotlin uses `+` for join all containers. (ref https://discuss.python.org/t/pep-584-survey-of-other-languages-operator-over...) Regards, -- Inada Naoki <songofacandy@gmail.com>
On 05/03/2019 09:42, Jimmy Girardet wrote:
Indeed the "obscure" argument should be thrown away.
The `|` operator in sets seems to be evident for every one on this list but I would be curious to know how many people first got a TypeError doing set1 + set2 and then found set1 | set2 in the doc.
Every. Single. Time. I don't use sets a lot (purely by happenstance rather than choice), and every time I do I have to go and look in the documentation because I expect the union operator to be '+'.
Except for math geek the `|` is always something obscure.
Two thirds of my degree is in maths, and '|' is still something I don't associate with sets. It would be unreasonable to expect '∩' and '∪' as the operators, but reasoning from '-' for set difference I always expect '+' and '*' as the union and intersection operators. Alas my hopes are always cruelly crushed :-) -- Rhodri James *-* Kynesim Ltd
Rhodri James wrote:
I have to go and look in the documentation because I expect the union operator to be '+'.
Anyone raised on Pascal is likely to find + and * more natural. Pascal doesn't have bitwise operators, so it re-uses + and * for set operations. I like the economy of this arrangement -- it's not as if there's any other obvious meaning that + and * could have for sets. -- Greg
On Mar 5, 2019, at 2:13 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Rhodri James wrote:
I have to go and look in the documentation because I expect the union operator to be '+'.
Anyone raised on Pascal is likely to find + and * more natural. Pascal doesn't have bitwise operators, so it re-uses + and * for set operations. I like the economy of this arrangement -- it's not as if there's any other obvious meaning that + and * could have for sets.
The language SETL (the language of sets) also uses + and * for set operations.¹ For us though, the decision to use | and & are set in stone. The time for debating the decision was 19 years ago.² Raymond ¹ https://www.linuxjournal.com/article/6805 ² https://www.python.org/dev/peps/pep-0218/
On Tue, Mar 5, 2019 at 6:07 PM Raymond Hettinger < raymond.hettinger@gmail.com> wrote:
On Mar 5, 2019, at 2:13 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Rhodri James wrote:
I have to go and look in the documentation because I expect the union operator to be '+'.
Anyone raised on Pascal is likely to find + and * more natural. Pascal doesn't have bitwise operators, so it re-uses + and * for set operations. I like the economy of this arrangement -- it's not as if there's any other obvious meaning that + and * could have for sets.
The language SETL (the language of sets) also uses + and * for set operations.¹
So the secret is out: Python inherits a lot from SETL, through ABC -- ABC was heavily influenced by SETL.
¹ https://www.linuxjournal.com/article/6805 ² https://www.python.org/dev/peps/pep-0218/
-- --Guido van Rossum (python.org/~guido)
On Mon, Mar 04, 2019 at 03:33:36PM -0500, Neil Girdhar wrote:
Maybe, but reading through the various replies, it seems that if you are adding "-" to be analogous to set difference, then the combination operator should be analogous to set union "|".
That's the purpose of this discussion, to decide whether dict merging is more like addition/concatenation or union :-)
And it also opens an opportunity to add set intersection "&".
What should intersection do in the case of matching keys? I see the merge + operator as a kind of update, whether it makes a copy or does it in place, so to me it is obvious that "last seen wins" should apply just as it does for the update method. But dict *intersection* is a more abstract operation than merge/update. And that leads to the problem, what do you do with the values? {key: "spam"} & {key: "eggs"} # could result in any of: {key: "spam"} {key: "eggs"} {key: ("spam", "eggs")} {key: "spameggs"} an exception something else? Unlike "update", I don't have any good use-cases to prefer any one of those over the others.
After all, how do you filter a dictionary to a set of keys?
d = {'some': 5, 'extra': 10, 'things': 55} d &= {'some', 'allowed', 'options'} d {'some': 5}
new = d - (d - allowed) {k:v for (k,v) in d if k in allowed}
* Regarding how to construct the new set in __add__, I now think this should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new
I like that, but it would be inefficient to do that for __sub__ since it would create elements that it might later delete.
def __sub__(self, other): new = self.copy() for k in other: del new[k] return new
is less efficient than
def __sub__(self, other): return type(self)({k: v for k, v in self.items() if k not in other})
I don't think you should be claiming what is more or less efficient unless you've actually profiled them for speed and memory use. Often, but not always, the two are in opposition: we make things faster by using more memory, and save memory at the cost of speed. Your version of __sub__ creates a temporary dict, which then has to be copied in order to preserve the type. Its not obvious to me that that's faster or more memory efficient than building a dict then deleting keys. (Remember that dicts aren't lists, and deleting keys is an O(1) operation.) -- Steven
On Mar 4, 2019, at 11:24 AM, Guido van Rossum <guido@python.org> wrote:
* Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.
I'm not sure that conclusion follows from the premise :-) Some ideas get proposed routinely because they are obvious things to propose, not because people actually need them. One hint is that the proposals always have generic variable names, "d = d1 + d2", and another is that they are almost never accompanied by actual use cases or real code that would be made better. I haven't seen anyone in this thread say they would use this more than once a year or that their existing code was unclear or inefficient in any way. The lack of dict addition support in other languages (like Java example) is another indicator that there isn't a real need -- afaict there is nothing about Python that would cause us to have a unique requirement that other languages don't have. FWIW, there are some downsides to the proposal -- it diminishes some of the unifying ideas about Python that I typically present on the first day of class: * One notion is that the APIs nudge users toward good code. The "copy.copy()" function has to be imported -- that minor nuisance is a subtle hint that copying isn't good for you. Likewise for dicts, writing "e=d.copy(); e.update(f)" is a minor nuisance that either serves to dissuade people from unnecessary copying or at least will make very clear what is happening. The original motivating use case for ChainMap() was to make a copy free replacement for excessively slow dict additions in ConfigParser. Giving a plus-operator to mappings is an invitation to writing code that doesn't scale well. * Another unifying notion is that the star-operator represents repeat addition across multiple data types. It is a nice demo to show that "a * 5 == a + a + a + a + a" where "a" is an int, float, complex, str, bytes, tuple, or list. Giving __add__() to dicts breaks this pattern. * When teaching dunder methods, the usual advice regarding operators is to use them only when their meaning is unequivocal; otherwise, have a preference for named methods where the method name clarifies what is being done -- don't use train+car to mean train.shunt_to_middle(car). For dicts that would mean not having the plus-operator implement something that isn't inherently additive (it applies replace/overwrite logic instead), that isn't commutative, and that isn't linear when applied in succession (d1+d2+d3). * In the advanced class where C extensions are covered, the organization of the slots is shown as a guide to which methods make sense together: tp_as_number, tp_as_sequence, and tp_as_mapping. For dicts to gain the requisite methods, they will have to become numbers (in the sense of filling out the tp_as_number slots). That will slow down the abstract methods that search the slot groups, skipping over groups marked as NULL. It also exposes method groups that don't typically appear together, blurring their distinction. * Lastly, there is a vague piece of zen-style advice, "if many things in the language have to change to implement idea X, it stops being worth it". In this case, it means that every dict-like API and the related abstract methods and typing equivalents would need to grow support for addition in mappings (would it even make sense to add to shelve objects or os.environ objects together?) That's my two cents worth. I'm ducking out now (nothing more to offer on the subject). Guido's participation in the thread has given it an air of inevitability so this post will likely not make a difference. Raymond
Adding the + operator for dictionaries feels like it would be a mistake in that it offers at most sugar-y benefits, but introduces the significant drawback of making it easier to introduced unintended errors. This would be the first instance of "addition" where the result can potentially lose/overwrite data (lists and strings both preserve the full extent of each operand; Counters include the full value from each operand, etc). Combining dictionaries is fundamentally an operation that requires more than one piece of information, because there's no single well-defined way to combine a pair of them. Off the top of my head, I can think of at least 2 different common options (replacement aka .update(), combination of values a la Counter). Neither of these is really a more valid "addition" of dictionaries. For specific dict-like subclasses, addition may make sense - Counter is a great example of this, because the additional context adds definition to the most logical method via which two instances would be combined. If anything, this seems like an argument to avoid implementing __ladd__ on dict itself, to leave the possibility space open for interpretation by more specific classes. *From: *Raymond Hettinger <raymond.hettinger@gmail.com> *Date: *Mon, Mar 4, 2019 at 9:53 PM *To: *Guido van Rossum *Cc: *python-ideas
On Mar 4, 2019, at 11:24 AM, Guido van Rossum <guido@python.org> wrote:
* Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.
I'm not sure that conclusion follows from the premise :-) Some ideas get proposed routinely because they are obvious things to propose, not because people actually need them. One hint is that the proposals always have generic variable names, "d = d1 + d2", and another is that they are almost never accompanied by actual use cases or real code that would be made better. I haven't seen anyone in this thread say they would use this more than once a year or that their existing code was unclear or inefficient in any way. The lack of dict addition support in other languages (like Java example) is another indicator that there isn't a real need -- afaict there is nothing about Python that would cause us to have a unique requirement that other languages don't have.
FWIW, there are some downsides to the proposal -- it diminishes some of the unifying ideas about Python that I typically present on the first day of class:
* One notion is that the APIs nudge users toward good code. The "copy.copy()" function has to be imported -- that minor nuisance is a subtle hint that copying isn't good for you. Likewise for dicts, writing "e=d.copy(); e.update(f)" is a minor nuisance that either serves to dissuade people from unnecessary copying or at least will make very clear what is happening. The original motivating use case for ChainMap() was to make a copy free replacement for excessively slow dict additions in ConfigParser. Giving a plus-operator to mappings is an invitation to writing code that doesn't scale well.
* Another unifying notion is that the star-operator represents repeat addition across multiple data types. It is a nice demo to show that "a * 5 == a + a + a + a + a" where "a" is an int, float, complex, str, bytes, tuple, or list. Giving __add__() to dicts breaks this pattern.
* When teaching dunder methods, the usual advice regarding operators is to use them only when their meaning is unequivocal; otherwise, have a preference for named methods where the method name clarifies what is being done -- don't use train+car to mean train.shunt_to_middle(car). For dicts that would mean not having the plus-operator implement something that isn't inherently additive (it applies replace/overwrite logic instead), that isn't commutative, and that isn't linear when applied in succession (d1+d2+d3).
* In the advanced class where C extensions are covered, the organization of the slots is shown as a guide to which methods make sense together: tp_as_number, tp_as_sequence, and tp_as_mapping. For dicts to gain the requisite methods, they will have to become numbers (in the sense of filling out the tp_as_number slots). That will slow down the abstract methods that search the slot groups, skipping over groups marked as NULL. It also exposes method groups that don't typically appear together, blurring their distinction.
* Lastly, there is a vague piece of zen-style advice, "if many things in the language have to change to implement idea X, it stops being worth it". In this case, it means that every dict-like API and the related abstract methods and typing equivalents would need to grow support for addition in mappings (would it even make sense to add to shelve objects or os.environ objects together?)
That's my two cents worth. I'm ducking out now (nothing more to offer on the subject). Guido's participation in the thread has given it an air of inevitability so this post will likely not make a difference.
Raymond
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Adding the + operator for dictionaries feels like it would be a mistake in that it offers at most sugar-y benefits, but introduces the significant drawback of making it easier to introduced unintended errors.
I disagree. This argument only really applies to the case "a = a + b", not "a = b + c". Making it easier and more natural to produce code that doesn't mutate in place is something that should reduce errors, not make them more common. The big mistake here was * for strings which is unusual, would be just as well served by a method, and will ensure that type errors blow up much later than it could have been. This type of mistake for dicts when you expected numbers is a much stronger argument against this proposal in my opinion. Let's not create another pitfall! The current syntax is a bit unwieldy but is really fine. / Anders
On Mon, Mar 04, 2019 at 10:18:13PM -0800, Amber Yust wrote:
Adding the + operator for dictionaries feels like it would be a mistake in that it offers at most sugar-y benefits, but introduces the significant drawback of making it easier to introduced unintended errors.
What sort of errors? I know that some (mis-)features are "bug magnets" that encourage people to write buggy code, but I don't see how this proposal is worse than dict.update(). In one way it is better, since D + E returns a new dict, instead of over-writing the data in D. Ask any functional programmer, and they'll tell you that we should avoid side-effects.
This would be the first instance of "addition" where the result can potentially lose/overwrite data (lists and strings both preserve the full extent of each operand; Counters include the full value from each operand, etc).
I don't see why this is relevant to addition. It doesn't even apply to numeric addition! If I give you the result of an addition: 101 say, you can't tell what the operands were. And that's not even getting into the intricicies of floating point addition, which can violate associativity ``(a + b) + c`` is not necessarily equal to ``a + (b + c)`` and distributivity: ``x*(a + b)`` is not necessarily equal to ``x*a + x*b`` even for well-behaved, numeric floats (not NANs or INFs).
Combining dictionaries is fundamentally an operation that requires more than one piece of information, because there's no single well-defined way to combine a pair of them.
Indeed, But some ways are more useful than others.
Off the top of my head, I can think of at least 2 different common options (replacement aka .update(), combination of values a la Counter). Neither of these is really a more valid "addition" of dictionaries.
That's why we have subclasses and operator overloading :-) By far the most commonly requested behaviour for this is copy-and- update (or merge, if you prefer). But subclasses are free to define it as they will, including: - add values, as Counter already does; - raise an exception if there is a duplicate key; - "first seen wins" or anything else. -- Steven
04.03.19 21:24, Guido van Rossum пише:
* Dicts are not like sets because the ordering operators (<, <=, >, >=) are not defined on dicts, but they implement subset comparisons for sets. I think this is another argument pleading against | as the operator to combine two dicts.
Well, I suppose that the next proposition will be to implement the ordering operators for dicts. Because why not? Lists and numbers support them. /sarcasm/ Jokes aside, dicts have more common with sets than with sequences. Both can not contain duplicated keys/elements. Both have the constant computational complexity of the containment test. For both the size of the merging/unioning can be less than the sum of sizes of original containers. Both have the same restrictions for keys/elements (hashability).
* Regarding how to construct the new set in __add__, I now think this should be done like this:
class dict: <other methods> def __add__(self, other): <checks that other makes sense, else return NotImplemented> new = self.copy() # A subclass may or may not choose to override new.update(other) return new
AFAICT this will give the expected result for defaultdict -- it keeps the default factory from the left operand (i.e., self).
No one builtin type that implements __add__ uses the copy() method. Dict would be the only exception from the general rule. And it would be much less efficient than {**d1, **d2}.
* Regarding how often this is needed, we know that this is proposed and discussed at length every few years, so I think this will fill a real need.
And every time this proposition was rejected. What has been changed since it was rejected the last time? We now have the expression form of dict merging ({**d1, **d2}), this should be decrease the need of the plus operator for dicts.
04.03.19 15:29, Serhiy Storchaka пише:
Using "|" looks more natural to me than using "+". We should look at discussions for using the "|" operator for sets, if the alternative of using "+" was considered, I think the same arguments for preferring "|" for sets are applicable now for dicts.
See the Python-Dev thread with the subject "Re: Re: PEP 218 (sets); moving set.py to Lib" starting from https://mail.python.org/pipermail/python-dev/2002-August/028104.html
I just wanted to mention this since it hasn't been brought up, but neither of these work a.keys() + b.keys() a.values() + b.values() a.items() + b.items() However, the following do work: a.keys() | b.keys() a.items() | b.items() Perhaps they work by coincidence (being set types), but I think it's worth bringing up, since a naive/natural Python implementation of dict addition/union would possibly involve the |-operator. Pål
Serhiy Storchaka wrote:
And this opens a non-easy problem: how to create a mapping of the same type?
That's the responsibility of the class implementing the + operator. There doesn't have to be any guarantee that a subclass of it will automatically return an instance of the subclass (many existing types provide no such guarantee, e.g. + on strings), so whatever strategy it uses doesn't have to be part of its public API. -- Greg
On Wed, Feb 27, 2019 at 11:18 PM Serhiy Storchaka <storchaka@gmail.com> wrote:
27.02.19 20:48, Guido van Rossum пише:
On Wed, Feb 27, 2019 at 10:42 AM Michael Selik <mike@selik.org <mailto:mike@selik.org>> wrote > The dict subclass
collections.Counter overrides the update method
for adding values instead of overwriting values.
https://docs.python.org/3/library/collections.html#collections.Counter.updat...
Counter also uses +/__add__ for a similar behavior.
>>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3})
At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
Great, this sounds like a good argument for + over |. The other argument is that | for sets *is* symmetrical, while + is used for other collections where it's not symmetrical. So it sounds like + is a winner here.
Counter uses + for a *different* behavior!
Counter(a=2) + Counter(a=3) Counter({'a': 5})
Well, you can see this as a special case. The proposed + operator on Mappings returns a new Mapping whose keys are the union of the keys of the two arguments; the value is the single value for a key that occurs in only one of the arguments, and *somehow* combined for a key that's in both. The way of combining keys is up to the type of Mapping. For dict, the second value wins (not so different as {'a': 1, 'a': 2}, which becomes {'a': 2}). But for other Mappings, the combination can be done differently -- and Counter chooses to add the two values.
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
But (as someone else pointed out) {**d1, **d2} always returns a dict, not the type of d1 and d2. Also, I'm sorry for PEP 448, but even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started! If you were to ask a newbie who has learned a few things (e.g. sequence concatenation) they would much more likely guess d1+d2. The argument for + over | has been mentioned elsewhere already. @Eric Smith <eric@trueblade.com>
I'd help out.
Please do! I tried to volunteer Stephen d'Aprano but I think he isn't interested in pushing through a controversial PEP. The PEP should probably also propose d1-d2. -- --Guido van Rossum (python.org/~guido)
The PEP should probably also propose d1-d2.
What would be the output of this? Does this return a new dictionary where keys in d2 are removed in d1 like sets?
d = dict((i, i) for i in range(5)) e = dict((i, i) for i in range(4, 10)) d {0: 0, 1: 1, 2: 2, 3: 3, 4: 4} e {4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9} d.items() - e.items() {(0, 0), (1, 1), (3, 3), (2, 2)} dict(d.items() - e.items()) {0: 0, 1: 1, 3: 3, 2: 2}
-- Regards, Karthikeyan S
Do we really need a "+" and a "-" operation on dictionaries? [dictinstance.update({k:v}) for k,v in dictinstance.items()] does handle merges already. And I'm assuming that "-" should return the difference -- set(d1.keys()) - set(d2.keys()), right? -- H -- OpenPGP: https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1 If you wish to request my time, please do so using *bit.ly/hd1AppointmentRequest <http://bit.ly/hd1AppointmentRequest>*. Si vous voudrais faire connnaisance, allez a *bit.ly/hd1AppointmentRequest <http://bit.ly/hd1AppointmentRequest>*. <https://sks-keyservers.net/pks/lookup?op=get&search=0xFEBAD7FFD041BBA1>Sent from my mobile device Envoye de mon portable
On Thu, Feb 28, 2019 at 08:59:30PM -0800, Hasan Diwan wrote:
Do we really need a "+" and a "-" operation on dictionaries? [dictinstance.update({k:v}) for k,v in dictinstance.items()] does handle merges already.
I don;t think that does what you intended. That merges dictinstance with itself (a no-op!), but one item at a time, so in the slowest, most inefficient way possible. Writing a comprehension for its side-effects is an anti-pattern that should be avoided. You are creating a (potentially large) list of Nones which has to be created, then garbage collected.
And I'm assuming that "-" should return the difference -- set(d1.keys()) - set(d2.keys()), right?
No. That throws away the values associated with the keys. P.S. As per Guido's ~~command~~ request *wink* I'm writing a PEP for this. I should have a draft ready later this evening. -- Steven
01.03.19 06:21, Guido van Rossum пише:
On Wed, Feb 27, 2019 at 11:18 PM Serhiy Storchaka <storchaka@gmail.com <mailto:storchaka@gmail.com>> wrote: Counter uses + for a *different* behavior!
>>> Counter(a=2) + Counter(a=3) Counter({'a': 5})
Well, you can see this as a special case. The proposed + operator on Mappings returns a new Mapping whose keys are the union of the keys of the two arguments; the value is the single value for a key that occurs in only one of the arguments, and *somehow* combined for a key that's in both. The way of combining keys is up to the type of Mapping. For dict, the second value wins (not so different as {'a': 1, 'a': 2}, which becomes {'a': 2}). But for other Mappings, the combination can be done differently -- and Counter chooses to add the two values.
Currently Counter += dict works and Counter + dict is an error. With this change Counter + dict will return a value, but it will be different from the result of the += operator. Also, if the custom dict subclass implemented the plus operator with different semantic which supports the addition with a dict, this change will break it, because dict + CustomDict will call dict.__add__ instead of CustomDict.__radd__. Adding support of new operators to builting types is dangerous.
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
But (as someone else pointed out) {**d1, **d2} always returns a dict, not the type of d1 and d2.
And this saves us from the hard problem of creating a mapping of the same type. Note that reference implementations discussed above make d1 + d2 always returning a dict. dict.copy() returns a dict.
Also, I'm sorry for PEP 448, but even if you know about **d in simpler contexts, if you were to ask a typical Python user how to combine two dicts into a new one, I doubt many people would think of {**d1, **d2}. I know I myself had forgotten about it when this thread started! If you were to ask a newbie who has learned a few things (e.g. sequence concatenation) they would much more likely guess d1+d2.
Perhaps the better solution is to update the documentation.
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
Currently Counter += dict works and Counter + dict is an error. With this change Counter + dict will return a value, but it will be different from the result of the += operator.
That's how list.__iadd__ works too: ListSubclass + list will return a value, but it might not be the same as += since that operates in place and uses a different dunder method. Why is it a problem for dicts but not a problem for lists?
Also, if the custom dict subclass implemented the plus operator with different semantic which supports the addition with a dict, this change will break it, because dict + CustomDict will call dict.__add__ instead of CustomDict.__radd__.
That's not how operators work in Python or at least that's not how they worked the last time I looked: if the behaviour has changed without discussion, that's a breaking change that should be reverted. Obviously I can't show this with dicts, but here it is with lists: py> class MyList(list): ... def __radd__(self, other): ... print("called subclass first") ... return "Something" ... py> [1, 2, 3] + MyList() called subclass first 'Something' This is normal, standard behaviour for Python operators: if the right operand is a subclass of the left operand, the reflected method __r*__ is called first.
Adding support of new operators to builting types is dangerous.
Explain what makes new operators more dangerous than old operators please.
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
But (as someone else pointed out) {**d1, **d2} always returns a dict, not the type of d1 and d2.
And this saves us from the hard problem of creating a mapping of the same type.
What's wrong with doing this? new = type(self)() Or the equivalent from C code. If that doesn't work, surely that's the fault of the subclass, the subclass is broken, and it will raise an exception. I don't think it is our responsibility to do anything more than call the subclass constructor. If that's broken, then so be it. Possibly relevant: I've always been frustrated and annoyed at classes that hardcode their own type into methods. E.g. something like: class X: def spam(self, arg): return X(eggs) # Wrong! Bad! Please use type(self) instead. That means that each subclass has to override every method: class MySubclass(X): def spam(self, arg): # Do nothing except change the type returned. return type(self)( super().spam(arg) ) This gets really annoying really quickly. Try subclassing int, for example, where you have to override something like 30+ methods and do nothing but wrap calls to super. -- Steven
On Friday, March 1, 2019 at 5:47:06 AM UTC-5, Steven D'Aprano wrote:
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
Currently Counter += dict works and Counter + dict is an error. With this change Counter + dict will return a value, but it will be different from the result of the += operator.
That's how list.__iadd__ works too: ListSubclass + list will return a value, but it might not be the same as += since that operates in place and uses a different dunder method.
Why is it a problem for dicts but not a problem for lists?
Also, if the custom dict subclass implemented the plus operator with different semantic which supports the addition with a dict, this change will break it, because dict + CustomDict will call dict.__add__ instead of CustomDict.__radd__.
That's not how operators work in Python or at least that's not how they worked the last time I looked: if the behaviour has changed without discussion, that's a breaking change that should be reverted.
Obviously I can't show this with dicts, but here it is with lists:
py> class MyList(list): ... def __radd__(self, other): ... print("called subclass first") ... return "Something" ... py> [1, 2, 3] + MyList() called subclass first 'Something'
This is normal, standard behaviour for Python operators: if the right operand is a subclass of the left operand, the reflected method __r*__ is called first.
Adding support of new operators to builting types is dangerous.
Explain what makes new operators more dangerous than old operators please.
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
But (as someone else pointed out) {**d1, **d2} always returns a dict, not the type of d1 and d2.
And this saves us from the hard problem of creating a mapping of the same type.
What's wrong with doing this?
new = type(self)()
Or the equivalent from C code. If that doesn't work, surely that's the fault of the subclass, the subclass is broken, and it will raise an exception.
I don't think it is our responsibility to do anything more than call the subclass constructor. If that's broken, then so be it.
Possibly relevant: I've always been frustrated and annoyed at classes that hardcode their own type into methods. E.g. something like:
class X: def spam(self, arg): return X(eggs) # Wrong! Bad! Please use type(self) instead.
That means that each subclass has to override every method:
class MySubclass(X): def spam(self, arg): # Do nothing except change the type returned. return type(self)( super().spam(arg) )
This gets really annoying really quickly. Try subclassing int, for example, where you have to override something like 30+ methods and do nothing but wrap calls to super.
I agree with you here. You might want to start a different thread with this idea and possibly come up with a PEP. There might be some pushback for efficiency's sake, so you might have to reel in your proposal to collections.abc mixin methods and UserDict methods. Regarding the proposal, I agree with the reasoning put forward by Guido and I like it. I think there should be: * d1 + d2 * d1 += d2 * d1 - d2 * d1 -= d2 which are roughly (ignoring steve's point about types) * {**d1, **d2} * d1.update(d2) * {k: v for k, v in d1.items() if k not in d2} * for k in list(d1): if k not in d2: del d1[k] Seeing this like this, there should be no confusion about what the operators do. I understand the points people made about the Zen of Python. However, I think that just like with lists, we tend to use l1+l2 when combining lists and [*l1, x, *l2, y] when combining lists and elements. Similarly, I think {**d1, **d2} should only be written when there are also key value pairs, like {**d1, k: v, **d2, k2: v2}. Best, Neil
-- Steven _______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
01.03.19 12:44, Steven D'Aprano пише:
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
Currently Counter += dict works and Counter + dict is an error. With this change Counter + dict will return a value, but it will be different from the result of the += operator.
That's how list.__iadd__ works too: ListSubclass + list will return a value, but it might not be the same as += since that operates in place and uses a different dunder method.
Why is it a problem for dicts but not a problem for lists?
Because the plus operator for lists predated any list subclasses.
Also, if the custom dict subclass implemented the plus operator with different semantic which supports the addition with a dict, this change will break it, because dict + CustomDict will call dict.__add__ instead of CustomDict.__radd__.
That's not how operators work in Python or at least that's not how they worked the last time I looked: if the behaviour has changed without discussion, that's a breaking change that should be reverted.
You are right.
What's wrong with doing this?
new = type(self)()
Or the equivalent from C code. If that doesn't work, surely that's the fault of the subclass, the subclass is broken, and it will raise an exception.
Try to do this with defaultdict. Note that none of builtin sequences or sets do this. For good reasons they always return an instance of the base type.
On Mon, Mar 04, 2019 at 03:43:48PM +0200, Serhiy Storchaka wrote:
01.03.19 12:44, Steven D'Aprano пише:
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
Currently Counter += dict works and Counter + dict is an error. With this change Counter + dict will return a value, but it will be different from the result of the += operator.
That's how list.__iadd__ works too: ListSubclass + list will return a value, but it might not be the same as += since that operates in place and uses a different dunder method.
Why is it a problem for dicts but not a problem for lists?
Because the plus operator for lists predated any list subclasses.
That doesn't answer my question. Just because it is older is no explaination for why this behaviour is not a problem for lists, or a problem for dicts. [...]
What's wrong with doing this?
new = type(self)()
Or the equivalent from C code. If that doesn't work, surely that's the fault of the subclass, the subclass is broken, and it will raise an exception.
Try to do this with defaultdict.
I did. It seems to work fine with my testing: py> defaultdict() defaultdict(None, {}) is precisely the behaviour I would expect. If it isn't the right thing to do, then defaultdict can override __add__ and __radd__.
Note that none of builtin sequences or sets do this. For good reasons they always return an instance of the base type.
What are those good reasons? -- Steven
04.03.19 15:43, Serhiy Storchaka пише:
01.03.19 12:44, Steven D'Aprano пише:
On Fri, Mar 01, 2019 at 08:47:36AM +0200, Serhiy Storchaka wrote:
Also, if the custom dict subclass implemented the plus operator with different semantic which supports the addition with a dict, this change will break it, because dict + CustomDict will call dict.__add__ instead of CustomDict.__radd__.
That's not how operators work in Python or at least that's not how they worked the last time I looked: if the behaviour has changed without discussion, that's a breaking change that should be reverted.
You are right.
Actually there is still a problem if the first argument is an instance of dict subclass that does not implement __add__.
On Thu, 28 Feb 2019 at 07:18, Serhiy Storchaka <storchaka@gmail.com> wrote:
[...]
I do not understand why we discuss a new syntax for dict merging if we already have a syntax for dict merging: {**d1, **d2} (which works with *all* mappings). Is not this contradicts the Zen?
FWIW there are already three ways for lists/sequences: [*x, *y] x + y x.extend(y) # in-place version We already have first and third for dicts/mappings, I don't see a big problem in adding a + for dicts, also this is not really a new syntax, just implementing couple dunders for a builtin class. So I actually like this idea. -- Ivan
Counter also uses +/__add__ for a similar behavior.
>>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3})
At first I worried that changing base dict would cause confusion for the subclass, but Counter seems to share the idea that update and + are synonyms.
Counter is a moot analogy. Counter's + and - operators follow the rules of numbers addition and subtraction:
c = Counter({"a": 1}) c + Counter({"a": 5}) Counter({'a': 6}) c + Counter({"a": 5}) - Counter({"a": 4}) Counter({'a': 2})
Which also means that in most cases (c1 + c2) - c2 == c1 which is not something you would expect with the suggested "dictionary addition" operation. As a side note, this is not true in general for Counters because of how subtraction handles 0. E.g.
c0 = Counter({"a": 0}) c1 = Counter({"a": 1}) (c0 + c1) - c1 Counter() (c0 + c1) - c1 == c0 False
--- The current intuition of how + and - work don't apply literally to this suggestion: 1) numeric types are their own story 2) most built-in sequences imply concatenation for + and have no subtraction 3) numpy-like arrays behave closer to numbers 4) Counters mimic numbers in some ways and while addition reminds of concatenation (but order is not relevant) they also have subtraction 5) sets have difference which is probably the closest you expect from dict subtraction, but no + operator --- I understand the arguments against a | operator for dicts but I don't entirely agree with them. dict is obviously a different type of object than all the others I've mentioned, even mathematically, and there is no clear precedent. If sets happened to maintain insertion order, like dicts after 3.6/3.7, I would expect the union operator to also preserve the order. Before 3.6 we probably would have seen dicts as closer to sets from that point of view, and this suggested addition as closer to set union. The question of symmetry ({"a": 1} + {"a": 2}) is an important one and I would consider not enforcing one resolution in PEP 584, and instead leave this undefined (i.e. in the resulting dict, the value could be either 1 or 2, or just completely undefined to also be compatible with Counter-like semantics in the same PEP). This is something to consider carefully if the plan is to make the new operators part of Mapping. It's not obvious that all mappings should implement this the same way, and a survey of what is being done by other implementation of Mappings would be useful. On the other hand leaving it undefined might make it harder to standardize it later, once other implementations have defined their own behavior. This question is probably on its own a valid argument against the proposal. When it comes to dicts (and not Mappings in general) {**d1, **d2} or d.update() already have clearly-defined semantics. The new proposal for a merge() operation might be more useful. The added value would be the ability to add two mappings regardless of concrete type. But it's with Mappings in general that this proposal is the most problematic. On the other hand the subtraction operator is probably less controversial and immediately useful (the idiom to remove keys from a dictionary is not obvious).
This question is probably on its own a valid argument against the proposal. When it comes to dicts (and not Mappings in general) {**d1, **d2} or d.update() already have clearly-defined semantics.
Actually, in my mind, this is an argument for an operator (or method) — besides being obtuse, the {**d1,**d2} syntax only creates actual dicts. If we had an operator defined for mappings in general, it would be easier to duck type dicts. I think this is pretty compelling, actually. And also an argument for aging the operation return the type it was invoked on, rather than always a dict. I can’t find the latest draft of the PEP, so I’m not sure if this is discussed there. But it should be. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Just to add to the discussion this was brought up previously as part of PEP 448 unpacking generalizations that also added {**x, **y} to merge two dicts in Python 3.5. Previous python ideas thread : https://mail.python.org/pipermail/python-ideas/2015-February/031748.html LWN summary : https://lwn.net/Articles/635397/ The previous thread is worth reading as some of the points still stand even with {**x, **y} added. On Wed, Feb 27, 2019 at 9:59 PM João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
Best regards,
João Matos
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Karthikeyan S
I dislike adding more operator overload to builtin types. str is not commutative, but it satisfies a in (a+b), and b in (a+b). There are no loss. In case of dict + dict, it not only sum. There may be loss value. {"a":1} + {"a":2} = ? In case of a.update(b), it's clear that b wins. In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me. Regards, On Thu, Feb 28, 2019 at 1:28 AM João Matos <jcrmatos@gmail.com> wrote:
Hello,
I would like to propose that instead of using this (applies to Py3.5 and upwards) dict_a = {**dict_a, **dict_b}
we could use dict_a = dict_a + dict_b
or even better dict_a += dict_b
Best regards,
João Matos
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- INADA Naoki <songofacandy@gmail.com>
On Fri, Mar 1, 2019 at 11:00 PM INADA Naoki <songofacandy@gmail.com> wrote:
I dislike adding more operator overload to builtin types.
str is not commutative, but it satisfies a in (a+b), and b in (a+b). There are no loss.
In case of dict + dict, it not only sum. There may be loss value.
{"a":1} + {"a":2} = ?
In case of a.update(b), it's clear that b wins. In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.
Picking semantics can be done as part of the PEP discussion, and needn't be a reason for rejecting the proposal before it's even made. We have at least one other precedent to consider:
{1} | {1.0} {1} {1.0} | {1} {1.0}
I have absolutely no doubt that these kinds of questions will be thoroughly hashed out (multiple times, even) before the PEP gets to pronouncement. ChrisA
On Fri, Mar 1, 2019 at 9:47 PM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Mar 1, 2019 at 11:00 PM INADA Naoki <songofacandy@gmail.com> wrote:
I dislike adding more operator overload to builtin types.
str is not commutative, but it satisfies a in (a+b), and b in (a+b). There are no loss.
In case of dict + dict, it not only sum. There may be loss value.
{"a":1} + {"a":2} = ?
In case of a.update(b), it's clear that b wins. In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.
Picking semantics can be done as part of the PEP discussion, and needn't be a reason for rejecting the proposal before it's even made.
Yes. I say just no semantics seems clear to me. I don't discuss which one is best. And I say only I dislike it. It must be free to express like or dislike, no?
We have at least one other precedent to consider:
{1} | {1.0} {1} {1.0} | {1} {1.0}
It is just because of behavior of int and float. It is not caused by set behavior. Set keeps "no loss" semantics when view of equality.
{1} <= ({1} | {1.0}) True {1.0} <= ({1} | {1.0}) True
So dict + dict is totally different than set | set. dict + dict has los at equality level. -- INADA Naoki <songofacandy@gmail.com>
On Fri, Mar 01, 2019 at 09:58:08PM +0900, INADA Naoki wrote:
{1} <= ({1} | {1.0}) True {1.0} <= ({1} | {1.0}) True
So dict + dict is totally different than set | set. dict + dict has los at equality level.
Is that an invariant you expect to apply to other uses of the + operator? py> x = -1 py> x <= (x + x) False py> [999] <= ([1, 2, 3] + [999]) False -- Steven
Is that an invariant you expect to apply to other uses of the + operator?
py> x = -1 py> x <= (x + x) False
py> [999] <= ([1, 2, 3] + [999]) False
Please calm down. I meant each type implements "sum" in semantics of the type, in lossless way. What "lossless" means is changed by the semantics of the type. -1 + -1 = -2 is sum in numerical semantics. There are no loss. [1, 2, 3] + [999] = [1, 2, 3, 999] is (lossless) sum in sequence semantics. So what about {"a": 1} + {"a": 2}. Is there (lossless) sum in dict semantics? * {"a": 1} -- It seems {"a": 2} is lost in dict semantics. Should it really called "sum" ? * {"a": 2} -- It seems {"a": 1} is lost in dict semantics. Should it really called "sum" ? * {"a": 3} -- It seems bit curious compared with + of sequence, because [2]+[3] is not [5]. It looks like more Counter than container. * ValueError -- Hmm, it looks ugly to me. So I don't think "sum" is not fit to dict semantics. Regards, -- INADA Naoki <songofacandy@gmail.com>
On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songofacandy@gmail.com> wrote:
Is that an invariant you expect to apply to other uses of the + operator?
py> x = -1 py> x <= (x + x) False
py> [999] <= ([1, 2, 3] + [999]) False
Please calm down. I meant each type implements "sum" in semantics of the type, in lossless way. What "lossless" means is changed by the semantics of the type.
-1 + -1 = -2 is sum in numerical semantics. There are no loss.
TBH I don't understand what is lossless about numeric addition. What is the definition of lossless? Clearly some information is lost, since you can't uniquely restore two numbers you add from the result. Unless you define what lossless means, there will be just more misunderstandings. -- Ivan
Sorry, I'm not good at English enough to explain my mental model. I meant no skip, no ignorance, no throw away. In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away. On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1} is skipped, ignored, or thrown away. I used "lost" to explain it. And I used "lossless" for "there is no lost". Not for reversible. If it isn't understandable to you, please ignore me. I think Rémi’s comment is very similar to my thought. Merging mapping is more complex than concatenate sequence and it seems hard to call it "sum". Regards, 2019年3月1日(金) 23:19 Ivan Levkivskyi <levkivskyi@gmail.com>:
On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songofacandy@gmail.com> wrote:
Is that an invariant you expect to apply to other uses of the + operator?
py> x = -1 py> x <= (x + x) False
py> [999] <= ([1, 2, 3] + [999]) False
Please calm down. I meant each type implements "sum" in semantics of the type, in lossless way. What "lossless" means is changed by the semantics of the type.
-1 + -1 = -2 is sum in numerical semantics. There are no loss.
TBH I don't understand what is lossless about numeric addition. What is the definition of lossless? Clearly some information is lost, since you can't uniquely restore two numbers you add from the result.
Unless you define what lossless means, there will be just more misunderstandings.
-- Ivan
On 3/1/2019 9:38 AM, INADA Naoki wrote:
Sorry, I'm not good at English enough to explain my mental model.
I meant no skip, no ignorance, no throw away.
In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away.
On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1} is skipped, ignored, or thrown away. I used "lost" to explain it.
And I used "lossless" for "there is no lost". Not for reversible.
If it isn't understandable to you, please ignore me.
I think Rémi’s comment is very similar to my thought. Merging mapping is more complex than concatenate sequence and it seems hard to call it "sum".
I understand Inada to be saying that each value on the LHS (as shown above) affects the result on the RHS. That's the case with addition of ints and other types, but not so with the proposed dict addition. As he says, the {a:1} doesn't affect the result. The result would be the same if this key wasn't present in the first dict, or if the key had a different value. This doesn't bother me, personally. I'm just trying to clarify. Eric
Regards,
2019年3月1日(金) 23:19 Ivan Levkivskyi <levkivskyi@gmail.com <mailto:levkivskyi@gmail.com>>:
On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songofacandy@gmail.com <mailto:songofacandy@gmail.com>> wrote:
> > > Is that an invariant you expect to apply to other uses of the + > operator? > > py> x = -1 > py> x <= (x + x) > False > > py> [999] <= ([1, 2, 3] + [999]) > False >
Please calm down. I meant each type implements "sum" in semantics of the type, in lossless way. What "lossless" means is changed by the semantics of the type.
-1 + -1 = -2 is sum in numerical semantics. There are no loss.
TBH I don't understand what is lossless about numeric addition. What is the definition of lossless? Clearly some information is lost, since you can't uniquely restore two numbers you add from the result.
Unless you define what lossless means, there will be just more misunderstandings.
-- Ivan
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, 1 Mar 2019 at 14:50, Eric V. Smith <eric@trueblade.com> wrote:
On 3/1/2019 9:38 AM, INADA Naoki wrote:
Sorry, I'm not good at English enough to explain my mental model.
I meant no skip, no ignorance, no throw away.
In case of 1+2=3, both of 1 and 2 are not skipped, ignored or thrown away.
On the other hand, in case of {a:1, b:2}+{a:2}={a:2, b:2}, I feel {a:1} is skipped, ignored, or thrown away. I used "lost" to explain it.
And I used "lossless" for "there is no lost". Not for reversible.
If it isn't understandable to you, please ignore me.
I think Rémi’s comment is very similar to my thought. Merging mapping is more complex than concatenate sequence and it seems hard to call it "sum".
I understand Inada to be saying that each value on the LHS (as shown above) affects the result on the RHS. That's the case with addition of ints and other types, but not so with the proposed dict addition. As he says, the {a:1} doesn't affect the result. The result would be the same if this key wasn't present in the first dict, or if the key had a different value.
This doesn't bother me, personally. I'm just trying to clarify.
OK, thanks for explaining! So more formally speaking, you want to say that for other examples of '+' in Python x1 + y == x2 + y if and only if x1 == x2, while for the proposed '+' for dicts there may be many different x_i such that x_i + y gives the same result. This doesn't bother me either, since this is not a critical requirement for addition. I would say this is rather a coincidence than a conscious decision. -- Ivan
OK, thanks for explaining! So more formally speaking, you want to say that for other examples of '+' in Python x1 + y == x2 + y if and only if x1 == x2, while for the proposed '+' for dicts there may be many different x_i such that x_i + y gives the same result.
It's bit different thank my mind. I'm OK to violate " x1 + y == x2 + y if and only if x1 == x2", if it's not important for semantics of type of x1, x2, and y. Mapping is defined by key: value pairs. It's core part. I don't want to call operator losts key: value pair as "sum". That's why I thought this proposal is more serious abuse of + operator. By the way, in case of sequence, `len(a) + len(b) == len(a + b)`. In case of set, `len(a) + len(b) >= len(a | b)`. Proposed operation looks similar to `set | set` than `seq + seq` in this point of view. I don't propose | than +. I just mean difference between dict.update() and seq+seq is not smaller than difference between dict.update() and set|set. If | seems not fit to this operation, + seems not fit to this operation too. -- INADA Naoki <songofacandy@gmail.com>
Eric V. Smith schrieb am 01.03.19 um 15:49:
I understand Inada to be saying that each value on the LHS (as shown above) affects the result on the RHS. That's the case with addition of ints and other types, but not so with the proposed dict addition. As he says, the {a:1} doesn't affect the result. The result would be the same if this key wasn't present in the first dict, or if the key had a different value.
This doesn't bother me, personally.
+1 Stefan
On 3/1/19 8:19 AM, Ivan Levkivskyi wrote:
On Fri, 1 Mar 2019 at 13:48, INADA Naoki <songofacandy@gmail.com> wrote:
Is that an invariant you expect to apply to other uses of the + operator?
py> x = -1 py> x <= (x + x) False
py> [999] <= ([1, 2, 3] + [999]) False
Please calm down. I meant each type implements "sum" in semantics of the type, in lossless way. What "lossless" means is changed by the semantics of the type.
-1 + -1 = -2 is sum in numerical semantics. There are no loss.
TBH I don't understand what is lossless about numeric addition. What is the definition of lossless? Clearly some information is lost, since you can't uniquely restore two numbers you add from the result.
Unless you define what lossless means, there will be just more misunderstandings.
I don't mean to put words into anyone's mouth, but I think I see what IDANA Naoki means: in other cases of summation, the result somehow includes or contains both operands. In the case of summing dicts, though, some of the operands are "lost" in the process. I'm sure that I'm nowhere near as prolific as many of the members of this list, but I don't remember ever merging dicts (and a quick grep of my Python source tree confirms same), so I won't comment further on the actual issue at hand.
On Fri, Mar 01, 2019 at 08:59:45PM +0900, INADA Naoki wrote:
I dislike adding more operator overload to builtin types.
str is not commutative, but it satisfies a in (a+b), and b in (a+b). There are no loss.
Is this an invariant you expect to apply for other classes that support the addition operator? 5 in (5 + 6) [1, 2, 3] in ([1, 2, 3] + [4, 5, 6]) Since it doesn't apply for int, float, complex, list or tuple, why do you think it must apply to dicts?
In case of dict + dict, it not only sum. There may be loss value.
Yes? Why is that a problem?
{"a":1} + {"a":2} = ?
Would you like to argue that Counter.__add__ is a mistake for the same reason? Counter(('a', 1)) + Counter(('a', 2)) = ? For the record, what I expected the above to do turned out to be *completely wrong* when I tried it. I expected Counter({'a': 3}) but the actual results are Counter({'a': 2, 1: 1, 2: 1}). Every operation is going to be mysterious if you have never learned what it does: from array import array a = array('i', [1, 2, 3]) b = array('i', [10, 20, 30]) a + b = ? Without trying it or reading the docs, should that be an error, or concatenation, or element-wise addition?
In case of a.update(b), it's clear that b wins.
It wasn't clear to me when I was a beginner and first came across dict.update. I had to learn what it did by experimenting with manual loops until it made sense to me.
In case of a + b, "which wins" or "exception raised on duplicated key?" is unclear to me.
Many things are unclear to me too. That doesn't make them any less useful. -- Steven
On Fri, Mar 1, 2019 at 10:10 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 01, 2019 at 08:59:45PM +0900, INADA Naoki wrote:
I dislike adding more operator overload to builtin types.
str is not commutative, but it satisfies a in (a+b), and b in (a+b). There are no loss.
Is this an invariant you expect to apply for other classes that support the addition operator?
5 in (5 + 6)
I meant more high level semantics: "no loss". Not only "in". So my example about set used "<=" operator. 5 + 6 is sum of 5 and 6.
[1, 2, 3] in ([1, 2, 3] + [4, 5, 6])
Both of [1,2,3] and [4,5,6] are not lost in result.
Since it doesn't apply for int, float, complex, list or tuple, why do you think it must apply to dicts?
You misunderstood my "no loss" expectation.
In case of dict + dict, it not only sum. There may be loss value.
Yes? Why is that a problem?
It's enough reason to I dislike.
{"a":1} + {"a":2} = ?
Would you like to argue that Counter.__add__ is a mistake for the same reason?
In Counter's case, it's clear. In case of dict, it's unclear.
Counter(('a', 1)) + Counter(('a', 2)) = ?
For the record, what I expected the above to do turned out to be *completely wrong* when I tried it. I expected Counter({'a': 3}) but the actual results are Counter({'a': 2, 1: 1, 2: 1}).
It just because you misunderstood Counter's initializer argument. It's not relating to how overload + or | operator.
Every operation is going to be mysterious if you have never learned what it does:
from array import array a = array('i', [1, 2, 3]) b = array('i', [10, 20, 30]) a + b = ?
Without trying it or reading the docs, should that be an error, or concatenation, or element-wise addition?
I never say every operator must be expected by everyone. Don't straw man. -- INADA Naoki <songofacandy@gmail.com>
participants (35)
-
Amber Yust
-
Anders Hovmöller
-
Antoine Pitrou
-
Brandt Bucher
-
Brett Cannon
-
Chris Angelico
-
Christopher Barker
-
Dan Sommers
-
David Mertz
-
Davide Rizzo
-
E. Madison Bray
-
Eric V. Smith
-
francismb
-
George Castillo
-
Greg Ewing
-
Guido van Rossum
-
Hasan Diwan
-
Inada Naoki
-
INADA Naoki
-
Ivan Levkivskyi
-
James Lu
-
Jimmy Girardet
-
João Matos
-
Karthikeyan
-
Michael Selik
-
MRAB
-
Neil Girdhar
-
Oleg Broytman
-
Paul Moore
-
Pål Grønås Drange
-
Raymond Hettinger
-
Rhodri James
-
Serhiy Storchaka
-
Stefan Behnel
-
Steven D'Aprano