Adding "+" and "+=" operators to dict
I mentioned this on the python-dev list [1] originally as a +1 to someone else suggesting the idea [2]. It also came up in a response to my post that I can't seem to find in the archives, so I've quoted it below [3]. As the subject says, the idea would be to add a "+" and "+=" operator to dict that would provide the following behavior:
{'x': 1, 'y': 2} + {'z': 3} {'x': 1, 'y': 2, 'z': 3}
With the only potentially non obvious case I can see then is when there are duplicate keys, in which case the syntax could just be defined that last setter wins, e.g.:
{'x': 1, 'y': 2} + {'x': 3} {'x': 3, 'y': 2}
Which is analogous to the example:
new_dict = dict1.copy() new_dict.update(dict2)
With "+=" then essentially ending up being an alias for ``dict.update(...)``. I'd be happy to champion this as a PEP if the feedback / public opinion heads in that direction. [1] https://mail.python.org/pipermail/python-dev/2015-February/138150.html [2] https://mail.python.org/pipermail/python-dev/2015-February/138116.html [3] John Wong --
Well looking at just list a + b yields new list a += b yields modified a then there is also .extend in list. etc. so do we want to follow list's footstep? I like + because + is more natural to read. Maybe this needs to be a separate thread. I am actually amazed to remember dict + dict is not possible... there must be a reason (performance??) for this...
Cheers, ~ Ian Lee
On Feb 11, 2015, at 2:21 AM, Ian Lee <ianlee1521@gmail.com> wrote:
I mentioned this on the python-dev list [1] originally as a +1 to someone else suggesting the idea [2]. It also came up in a response to my post that I can't seem to find in the archives, so I've quoted it below [3].
As the subject says, the idea would be to add a "+" and "+=" operator to dict that would provide the following behavior:
{'x': 1, 'y': 2} + {'z': 3} {'x': 1, 'y': 2, 'z': 3}
With the only potentially non obvious case I can see then is when there are duplicate keys, in which case the syntax could just be defined that last setter wins, e.g.:
{'x': 1, 'y': 2} + {'x': 3} {'x': 3, 'y': 2}
Which is analogous to the example:
new_dict = dict1.copy() new_dict.update(dict2)
With "+=" then essentially ending up being an alias for ``dict.update(...)``.
I'd be happy to champion this as a PEP if the feedback / public opinion heads in that direction.
[1] https://mail.python.org/pipermail/python-dev/2015-February/138150.html <https://mail.python.org/pipermail/python-dev/2015-February/138150.html> [2] https://mail.python.org/pipermail/python-dev/2015-February/138116.html <https://mail.python.org/pipermail/python-dev/2015-February/138116.html> [3] John Wong -- Well looking at just list a + b yields new list a += b yields modified a then there is also .extend in list. etc. so do we want to follow list's footstep? I like + because + is more natural to read. Maybe this needs to be a separate thread. I am actually amazed to remember dict + dict is not possible... there must be a reason (performance??) for this...
I’d really like this change and I think that it makes sense. The only thing I’d change is that I think the | operator makes more sense than +. dicts are more like sets than they are like lists so a union operator makes more sense I think. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 02/11/2015 04:27 AM, Donald Stufft wrote:
I’d really like this change and I think that it makes sense. The only thing I’d change is that I think the | operator makes more sense than +. dicts are more like sets than they are like lists so a union operator makes more sense I think.
Maybe I'm just not steeped enough in CS, but when I want to combine two things together, my first reflex is always '+'. I suppose I could come around to '|', though -- it does ease the tension around the behavior of duplicate keys. -- ~Ethan~
Addition in the usual sense of the word wouldn't be commutative for dictionaries. In particular, it's hard to see how you could define addition so these two expressions are equal: {'a': 1} + {'a': 2} {'a': 2} + {'a': 1} '+=' is no problem. Skip
On Thu, Feb 12, 2015 at 2:24 PM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
Addition in the usual sense of the word wouldn't be commutative for dictionaries. In particular, it's hard to see how you could define addition so these two expressions are equal:
{'a': 1} + {'a': 2}
{'a': 2} + {'a': 1}
'+=' is no problem.
Does it have to be? It isn't commutative for strings or tuples either. Addition of complex objects does at times depend on order (though as we saw in another thread, it can be very confusing if the _type_ can change if you switch the operands), so I would have no problem with the right-hand operand "winning" when there's a key collision. ChrisA
On 12/02/2015 4:59 p.m., Chris Angelico wrote:
Addition in the usual sense of the word wouldn't be commutative for dictionaries.
Does it have to be? It isn't commutative for strings or tuples either.
I think associativity is the property in question, and it does hold for string and tuple concatenation. Dict addition could be made associative by raising an exception on duplicate keys. Another way would be to define {'a':x} + {'a':y} as {'a': x + y}, but that would probably upset a lot of people. :-) -- Greg
On Feb 11, 2015, at 20:06, Greg <greg.ewing@canterbury.ac.nz> wrote:
On 12/02/2015 4:59 p.m., Chris Angelico wrote:
Addition in the usual sense of the word wouldn't be commutative for dictionaries.
Does it have to be? It isn't commutative for strings or tuples either.
I think associativity is the property in question, and it does hold for string and tuple concatenation.
No, I think commutativity is the property in question, because dict merging with the right side winning is already associative but not commutative--exactly like string and tuple addition. Commutative means ab = ba. Associative means a(bc) = (ab)c. Also, using addition for noncommutative operations is not some anti-mathematical perversion by Python; mathematicians (and, I believe, physicists even more so) generally use the same symbols and names for addition and multiplication of noncommutative algebras like the quaternions (and even nonassociative, like the octonions) that they use for the commutative reals and complexes. In fact, the idea of noncommutative addition or multiplication is kind of the starting point of 19th century algebra and everything that follows from it.
Dict addition could be made associative by raising an exception on duplicate keys.
That would make it commutative.
Another way would be to define {'a':x} + {'a':y} as {'a': x + y}, but that would probably upset a lot of people. :-)
Of course that's exactly what Counter does. And it also defines | to give {'a': max(x, y)}. Which is why it's important to consider the __radd__/__ror__ issue. What should happen if you add a dict to a Counter, or vice-versa?
random832@fastmail.us wrote:
On Wed, Feb 11, 2015, at 23:06, Greg wrote:
Dict addition could be made associative by raising an exception on duplicate keys.
Why isn't it associative otherwise?
I wasn't thinking straight yesterday. It is of course associative under left-operand-wins o right-operand-wins also. The OP was right that commutativity is the issue. It's true that there are many non-commutative operations used in mathematics, but there is usually some kind of symmetry about them nonetheless. They're not biased towards one operand or the other. For example, sequence concatenation obeys the relation a + b == reversed(reversed(b) + reversed(a)) and matrix multiplication obeys A * B == transpose(transpose(B) * transpose(A)) There would be no such identity for dict addition under a rule that favours one operand over the other, which makes it seem like a bad idea to me to use an operator such as + or | that is normally expected to be symmetrical. -- Greg
On 12.02.2015 04:59, Chris Angelico wrote:
On Thu, Feb 12, 2015 at 2:24 PM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
Addition in the usual sense of the word wouldn't be commutative for dictionaries. In particular, it's hard to see how you could define addition so these two expressions are equal:
{'a': 1} + {'a': 2}
{'a': 2} + {'a': 1}
'+=' is no problem.
Does it have to be? It isn't commutative for strings or tuples either. Addition of complex objects does at times depend on order (though as we saw in another thread, it can be very confusing if the _type_ can change if you switch the operands), so I would have no problem with the right-hand operand "winning" when there's a key collision.
Solving that is simple: you define key collisions as having an undefined result. Then '+' for dicts is commutative. However, I don't really see the point in having an operation that takes two dictionaries, creates a new empty one and updates this with both sides of the operand. It may be theoretically useful, but it results in the same poor performance you have in string concatenation. In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare. So +1 on the '+=' syntax, -1 on '+' for dicts. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Feb 12 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
* M.-A. Lemburg <mal@egenix.com> [2015-02-12 09:51:22 +0100]:
However, I don't really see the point in having an operation that takes two dictionaries, creates a new empty one and updates this with both sides of the operand. It may be theoretically useful, but it results in the same poor performance you have in string concatenation.
In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare.
So +1 on the '+=' syntax, -1 on '+' for dicts.
I think it'd be rather confusing if += works but + does not. Regarding duplicate keys, IMHO only two options make sense: 1) Raise an exception 2) The right-hand side overrides keys in the left-hand side. I think #1 would prevent many useful cases where + could be used instead of .update(), so +1 for #2. Florian -- http://www.the-compiler.org | me@the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/
Alright, I've tried to gather up all of the feedback and organize it in something approaching the alpha draft of a PEP might look like:: Proposed New Methods on dict ============================ Adds two dicts together, returning a new object with the type of the left hand operand. This would be roughly equivalent to calling: >>> new_dict = old_dict.copy(); >>> new_dict.update(other_dict) Where ``new_dict`` is the same type as ``old_dict``, but possibly not the same as ``other_dict``. __add__ ------- >>> foo = {'a': 1, 'b': 'xyz'} >>> bar = {'a': 5.0, 'c': object()} >>> baz = foo + bar >>> foo {'a': 1, 'b': 'xyz'} >>> bar {'a': 5.0, 'c': object()} >> baz {'a': 5.0, 'b': 'xyz', 'c': object()} __iadd__ -------- >>> foo = {'a': 1, 'b': 'xyz'} >>> bar = {'a': 5.0, 'c': object()} >>> foo += bar >>> foo {'a': 5.0, 'b': 'xyz', 'c': object()} >>> bar {'a': 5.0, 'c': object()} __radd__ -------- The reverse add was mentioned in [2], [5]. I'm not sure I have the following example exactly right, particularly the type. My initial thought is to have it come out as the type of the left hand operand. So, assuming LegacyMap has no ``__add__`` method: >>> foo = LegacyMap{'a': 1, 'b': 'xyz'} >>> bar = dict({'a': 5.0, 'c': object()}) >>> baz = foo + bar # e.g. bar.__radd__(foo) >>> baz {'a': 5.0, 'b': 'xyz', 'c': object()} >>> type(baz) == LegacyMap Considerations ============== Key Collisions -------------- When there is a key collision, handle as currently by ``dict.update()``, namely that the right hand operand "wins". Adding mappings of different types ---------------------------------- Here is what currently happens in a few cases in Python 3.4, given: >>> class A(dict): pass >>> class B(dict): pass >>> foo = A({'a': 1, 'b': 'xyz'}) >>> bar = B({'a': 5.0, 'c': object()}) Currently (this surprised me actually... I guess it gets the parent class?): >>> baz = foo.copy() >>> type(baz) dict Currently: >>> foo.update(bar) >>> foo >>> type(foo) A Idea about '+' adding values ---------------------------- The idea of ``{'x': 1} + {'x': 2} == {'x': 3}`` was mentioned [3], [4] and seems to not have had a lot of support. In particular, this goes against the current usage of the ``dict.update()`` method, where: >>> d = {'x': 1} >>> d.update({'x': 2}) >>> d {'x': 2} '+' vs '|' Operator ------------------- If I was completely uninitiated, I would probably reach for the '+' first, but I don't feel this is the crux of the issue... Backport to PyPI ----------------
And whether it's worth writing a dict subclass that adds this method and putting it on PyPI as a backport (people writing 3.3+ code or 2.7/3.5 code can then just "from dict35 import dict35 as dict", but of course they still won't be able to add two dicts constructed from literals). -- Andrew Barnert [2]
Sure, I'd be willing to do this. References ========== [1] https://mail.python.org/pipermail/python-ideas/2014-July/028440.html [2] https://mail.python.org/pipermail/python-ideas/2015-February/031755.html [3] https://mail.python.org/pipermail/python-ideas/2015-February/031773.html [4] https://mail.python.org/pipermail/python-ideas/2014-July/028425.html Other Discussions ----------------- As best I can, tell from reading through some of these threads there was never really a conclusion, and instead the discussions eventually fizzled out. https://mail.python.org/pipermail/python-ideas/2014-July/028424.html https://mail.python.org/pipermail/python-ideas/2011-December/013227.html https://mail.python.org/pipermail/python-ideas/2013-June/021140.html ~ Ian Lee On Thu, Feb 12, 2015 at 12:59 AM, Florian Bruhin <me@the-compiler.org> wrote:
* M.-A. Lemburg <mal@egenix.com> [2015-02-12 09:51:22 +0100]:
However, I don't really see the point in having an operation that takes two dictionaries, creates a new empty one and updates this with both sides of the operand. It may be theoretically useful, but it results in the same poor performance you have in string concatenation.
In applications, you normally just need the update functionality for dictionaries. If you do need a copy, you can create a copy explicitly - but those cases are usually rare.
So +1 on the '+=' syntax, -1 on '+' for dicts.
I think it'd be rather confusing if += works but + does not.
Regarding duplicate keys, IMHO only two options make sense:
1) Raise an exception 2) The right-hand side overrides keys in the left-hand side.
I think #1 would prevent many useful cases where + could be used instead of .update(), so +1 for #2.
Florian
-- http://www.the-compiler.org | me@the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Feb 12, 2015 at 01:07:52AM -0800, Ian Lee wrote:
Alright, I've tried to gather up all of the feedback and organize it in something approaching the alpha draft of a PEP might look like::
Proposed New Methods on dict ============================
Adds two dicts together, returning a new object with the type of the left hand operand. This would be roughly equivalent to calling:
>>> new_dict = old_dict.copy(); >>> new_dict.update(other_dict)
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above. I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists. I certainly wouldn't want to write new_dict = a + b + c + d and have O(N**2) performance when I can do this instead new_dict = {} for old_dict in (a, b, c, d): new_dict.update(old_dict) and have O(N) performance. It's easy to wrap this in a small utility function if you need to do it repeatedly. There was an earlier proposal to add an updated() built-in, by analogy with list.sort/sorted, there would be dict.update/updated. Here's the version I have in my personal toolbox (feel free to use it, or not, as you see fit): def updated(base, *mappings, **kw): """Return a new dict from one or more mappings and keyword arguments. The dict is initialised from the first argument, which must be a mapping that supports copy() and update() methods. The dict is then updated from any subsequent positional arguments, from left to right, followed by any keyword arguments. >>> d = updated({'a': 1}, {'b': 2}, [('a', 100), ('c', 200)], c=3) >>> d == {'a': 100, 'b': 2, 'c': 3} True """ new = base.copy() for mapping in mappings + (kw,): new.update(mapping) return new although as I said, I've never needed to use it in 15+ years. Someday, perhaps... Note that because this is built on the update method, it supports sequences of (key,value) tuples, and additional keyword arguments, which a binary operator cannot do. I would give a +0.5 on a proposal to add an updated() builtin: I doubt that it will be used often, but perhaps it will come in handy from time to time. As the experience with list.sort versus sorted() shows, making one a method and one a function helps avoid confusion. The risk of confusion makes me less enthusiastic about adding a dict.updated() method: only +0 for that. I dislike the use of + for concatenation, but can live with it. This operation, however, is not concatenation, nor is it addition. I am strongly -1 on the + operator, and -0.75 on | operator. -- Steve
On Feb 12, 2015, at 5:43 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 01:07:52AM -0800, Ian Lee wrote:
Alright, I've tried to gather up all of the feedback and organize it in something approaching the alpha draft of a PEP might look like::
Proposed New Methods on dict ============================
Adds two dicts together, returning a new object with the type of the left hand operand. This would be roughly equivalent to calling:
new_dict = old_dict.copy(); new_dict.update(other_dict)
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above.
I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists.
I’ve wanted this several times, explicitly the copying variant of it. I always get slightly annoyed whenever I have to manually spell out the copy and the update. I still think it should use | rather than + though, to match sets. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Feb 12, 2015 at 5:25 AM, Donald Stufft <donald@stufft.io> wrote:
I’ve wanted this several times, explicitly the copying variant of it. I always get slightly annoyed whenever I have to manually spell out the copy and the update.
copy-and-update: dict(old_dict, **other_dict) -eric
On Feb 12, 2015, at 6:28 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Thu, Feb 12, 2015 at 5:25 AM, Donald Stufft <donald@stufft.io> wrote:
I’ve wanted this several times, explicitly the copying variant of it. I always get slightly annoyed whenever I have to manually spell out the copy and the update.
copy-and-update:
dict(old_dict, **other_dict)
Only works if other_dict’s keys are all valid keyword arguments and AFAIK is considered an implementation detail of CPython. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On 02/12/2015 04:29 PM, Donald Stufft wrote:
On Feb 12, 2015, at 6:28 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Thu, Feb 12, 2015 at 5:25 AM, Donald Stufft <donald@stufft.io> wrote:
I’ve wanted this several times, explicitly the copying variant of it. I always get slightly annoyed whenever I have to manually spell out the copy and the update.
copy-and-update:
dict(old_dict, **other_dict)
Only works if other_dict’s keys are all valid keyword arguments and AFAIK is considered an implementation detail of CPython.
To be clear, kwargs to the dict() constructor are not an implementation detail of CPython. The implementation detail of CPython (2.x only) is that this technique works at all if other_dict has keys which are not strings. That has been changed for consistency in Python 3, and never worked in any of the alternative Python implementations AFAIK. So you're right that this technique is not usable as a general-purpose copy-and-update. Also, Guido doesn't like it: http://mail.python.org/pipermail/python-dev/2010-April/099459.html I think the fact that it is so often recommended is another bit of evidence that there is demand for copy-and-update-as-expression, though. Carl
On Thu, Feb 12, 2015 at 4:29 PM, Donald Stufft <donald@stufft.io> wrote:
On Feb 12, 2015, at 6:28 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: copy-and-update:
dict(old_dict, **other_dict)
Only works if other_dict’s keys are all valid keyword arguments and AFAIK is considered an implementation detail of CPython.
Fair point. It seems to me that there is definitely enough interest in a builtin way of doing this. My vote is for a dict classmethod. -eric
On 02/12/2015 04:19 PM, Eric Snow wrote:
On Thu, Feb 12, 2015 at 4:29 PM, Donald Stufft <donald@stufft.io> wrote:
On Feb 12, 2015, at 6:28 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote: copy-and-update:
dict(old_dict, **other_dict)
Only works if other_dict’s keys are all valid keyword arguments and AFAIK is considered an implementation detail of CPython.
Fair point. It seems to me that there is definitely enough interest in a builtin way of doing this. My vote is for a dict classmethod.
Like __add__, for example? ;) -- ~Ethan~
I'm also +1, I agree with Donald that "|" makes more sense to me than "+" if only for consistency with sets. In mathematics a mapping is a set of pairs (preimage, postimage), and we are taking the union of these sets. On Thursday, February 12, 2015 at 7:26:20 AM UTC-5, Donald Stufft wrote:
On Feb 12, 2015, at 5:43 AM, Steven D'Aprano <st...@pearwood.info <javascript:>> wrote:
On Thu, Feb 12, 2015 at 01:07:52AM -0800, Ian Lee wrote:
Alright, I've tried to gather up all of the feedback and organize it in something approaching the alpha draft of a PEP might look like::
Proposed New Methods on dict ============================
Adds two dicts together, returning a new object with the type of the left hand operand. This would be roughly equivalent to calling:
new_dict = old_dict.copy(); new_dict.update(other_dict)
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above.
I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists.
I’ve wanted this several times, explicitly the copying variant of it. I always get slightly annoyed whenever I have to manually spell out the copy and the update.
I still think it should use | rather than + though, to match sets.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
_______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 02/15/2015 05:14 PM, Neil Girdhar wrote:
I'm also +1, I agree with Donald that "|" makes more sense to me than "+" if only for consistency with sets. In mathematics a mapping is a set of pairs (preimage, postimage), and we are taking the union of these sets.
An option might be to allow a union "|" if both the (key, value) pairs that match in each are the same. Other wise, raise an exception. Ron
On Sun, Feb 15, 2015 at 5:51 PM, Ron Adam <ron3200@gmail.com> wrote:
On 02/15/2015 05:14 PM, Neil Girdhar wrote:
I'm also +1, I agree with Donald that "|" makes more sense to me than "+" if only for consistency with sets. In mathematics a mapping is a set of pairs (preimage, postimage), and we are taking the union of these sets.
An option might be to allow a union "|" if both the (key, value) pairs that match in each are the same. Other wise, raise an exception.
I've only loosely followed this thread, so I apologize if examples were already linked or things already said. I want to reinforce that set-like operations/parallels are a million times better than + ... esp. considering dict.viewkeys is pretty much a set. please please don't use + !!! Now, a link to an implementation we have used for several years with success, and the teams here seem to think it makes sense: https://github.com/xtfxme/xacto/blob/master/xacto/__init__.py#L140 (dont worry about the class decorator/etc, you can just as easily take that class and subclass dict to get a type supporting all set operations) The premise taken here is that dicts ARE sets that simply happen to have values associated with them... hence all operations are against keys ONLY. The fact that values tag along is irrelevant. I toyed with supporting values too but it makes things ambiguous (is that a key/val pair or a tuple key?) So, given:
d1 = fancy_dict(a=1, b=2, c=3) d2 = fancy_dict(c=4, d=5, e=6)
The implementation allows you to do such things as: # union (first key winning here... IIRC this is how sets actually work)
d1 | d2 {'a': 1, 'b': 2, 'c': 3, 'd': 5, 'e': 6}
# intersection (again, first key wins)
d1 & d2 {'c': 3}
# difference (use ANY iterable as a filter!)
d1 - ('a', 'b') {'c': 3}
# symmetric difference
d1 ^ d2 {'a': 1, 'b': 2, 'd': 5, 'e': 6}
All of the operations support arbitrary iterables as the RHS! This is NICE. ASIDE: `d1 | d2` is NOT the same as .update() here (though maybe it should be)... I believe this stems from the fact that (IIRC :) a normal set() WILL NOT replace an entry it already has with a new one, if they hash() the same. IOW, a set will prefer the old object to the new one, and since the values are tied to the keys, the old value persists. Something to think about. -- C Anthony
On Tue, Feb 17, 2015 at 4:22 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
All of the operations support arbitrary iterables as the RHS! This is NICE.
ASIDE: `d1 | d2` is NOT the same as .update() here (though maybe it should be)... I believe this stems from the fact that (IIRC :) a normal set() WILL NOT replace an entry it already has with a new one, if they hash() the same. IOW, a set will prefer the old object to the new one, and since the values are tied to the keys, the old value persists.
Something to think about.
Forgot to mention the impl supports arbitrary iterables as the LHS as well, with values simply becoming None. Whether RHS or LHS, a type(fancy_dict) is always returned. I really really want to reiterate the fact that values are IGNORED... if you just treat a dict like a set with values hanging off the key, the implementation stays perfectly consistent with sets, and does the same thing no matter what it's operating with. Just pretend like the values aren't there and what you get in the end makes a lot of sense. -- C Anthony
On Tue, Feb 17, 2015 at 4:38 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
On Tue, Feb 17, 2015 at 4:22 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
All of the operations support arbitrary iterables as the RHS! This is NICE.
ASIDE: `d1 | d2` is NOT the same as .update() here (though maybe it should be)... I believe this stems from the fact that (IIRC :) a normal set() WILL NOT replace an entry it already has with a new one, if they hash() the same. IOW, a set will prefer the old object to the new one, and since the values are tied to the keys, the old value persists.
Something to think about.
Forgot to mention the impl supports arbitrary iterables as the LHS as well, with values simply becoming None. Whether RHS or LHS, a type(fancy_dict) is always returned.
I really really want to reiterate the fact that values are IGNORED... if you just treat a dict like a set with values hanging off the key, the implementation stays perfectly consistent with sets, and does the same thing no matter what it's operating with. Just pretend like the values aren't there and what you get in the end makes a lot of sense.
Trying not to go OT for this thread (or list), run the following on python2 or python3 (http://pastie.org/9958218): import itertools class thing(str): n = itertools.count() def __new__(cls, *args, **kwds): self = super(thing, cls).__new__(cls, *args, **kwds) self.n = next(self.n) print(' LIFE! %s' % self.n) return self def __del__(self): print(' WHYY? %s' % self.n) print('---------------( via constructor )') # bug? literal does something different than constructor? set_of_things = set([thing('hi'), thing('hi'), thing('hi')]) # reset so it's easy to see the difference thing.n = itertools.count() print('---------------( via literal )') # use a different identifier else GC on overwrite unused_set_of_things = {thing('hi'), thing('hi'), thing('hi')} print('---------------( adding another )') set_of_things.add(thing('hi')) print('---------------( done )') you will see something like this: ---------------( via constructor ) LIFE! 0 LIFE! 1 LIFE! 2 WHYY? 2 WHYY? 1 ---------------( via literal ) LIFE! 0 LIFE! 1 LIFE! 2 WHYY? 1 WHYY? 0 ---------------( adding another ) LIFE! 3 WHYY? 3 ---------------( done ) WHYY? 0 WHYY? 2 as shown, adding to a set discards the *new* object, not the old (and as an aside, seems to be a little buggy during construction, else the final 2 would have the same id!)... should this happen? I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last). -- C Anthony
C Anthony Risinger writes:
I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last).
But this is exactly the opposite of what the people who advocate use of an operator want. As far as I can see, all of them want update semantics, because that's the more common use case where the current idioms feel burdensome.
On Tue, Feb 17, 2015 at 10:08 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
C Anthony Risinger writes:
I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last).
But this is exactly the opposite of what the people who advocate use of an operator want. As far as I can see, all of them want update semantics, because that's the more common use case where the current idioms feel burdensome.
True... maybe that really is a good case for the + then, as something like .update(). Personally, I think making dict be more set-like is way more interesting/useful, because of the *filtering* capabilities: # drop keys d1 -= (keys_ignored, ...) # apply [inverted] mask d1 &= (keys_required, ...) d1 ^= (keys_forbidden, ...) __or__ would still work like dict.viewkeys.__or__, and behaves like a bulk .setdefault() which is another neat property: # same as looping d2 calling d1.setdefault(...) d1 |= d2 Using + for the .update(...) case seems nice too :) -- C Anthony
On 02/17/2015 11:50 PM, C Anthony Risinger wrote:
On Tue, Feb 17, 2015 at 10:08 PM, Stephen J. Turnbull <stephen@xemacs.org <mailto:stephen@xemacs.org>> wrote:
C Anthony Risinger writes:
> I'm not versed enough in the math behind it to know if it's expected or > not, but as it stands, to remain compatible with sets, `d1 | d2` should > behave like it does in my code (prefer the first, not the last). I kinda > like this, because it makes dict.__or__ a *companion* to .update(), not a > replacement (since update prefers the last).
But this is exactly the opposite of what the people who advocate use of an operator want. As far as I can see, all of them want update semantics, because that's the more common use case where the current idioms feel burdensome.
True... maybe that really is a good case for the + then, as something like .update().
Personally, I think making dict be more set-like is way more interesting/useful, because of the *filtering* capabilities:
Maybe it would work better as a multi-dict where you can have more than one value for a key. But I think it also is specialised enough that it may be better off on pypi. Cheers, Ron
On Tue, Feb 17, 2015 at 11:08 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
C Anthony Risinger writes:
I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last).
But this is exactly the opposite of what the people who advocate use of an operator want. As far as I can see, all of them want update semantics, because that's the more common use case where the current idioms feel burdensome.
yes, +1 also, if this goes through it should be added to collections.abc.Mapping
On Feb 17, 2015, at 19:30, C Anthony Risinger <anthony@xtfx.me> wrote:
On Tue, Feb 17, 2015 at 4:38 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
On Tue, Feb 17, 2015 at 4:22 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
All of the operations support arbitrary iterables as the RHS! This is NICE.
ASIDE: `d1 | d2` is NOT the same as .update() here (though maybe it should be)... I believe this stems from the fact that (IIRC :) a normal set() WILL NOT replace an entry it already has with a new one, if they hash() the same. IOW, a set will prefer the old object to the new one, and since the values are tied to the keys, the old value persists.
I'm pretty sure that's just an implementation artifact of CPython, not something the language requires. If an implementation wanted to build sets directly on dicts and handle s.add(x) as s._dict[x] = None, or build them on some other language's set type that has a replace method instead of an add-if-new method, that would be perfectly valid. And the whole point of sets is to deal with objects that are interchangeable if equal. If you write {1, 2} U {2, 3}, it's meaningless to ask which 2 ends up in the result; 2 is 2. The fact that Python lets you tack on other information to distinguish two 2's means that an implementation has to make a choice there, but it doesn't proscribe any choice.
Something to think about.
Forgot to mention the impl supports arbitrary iterables as the LHS as well, with values simply becoming None. Whether RHS or LHS, a type(fancy_dict) is always returned.
I really really want to reiterate the fact that values are IGNORED... if you just treat a dict like a set with values hanging off the key, the implementation stays perfectly consistent with sets, and does the same thing no matter what it's operating with. Just pretend like the values aren't there and what you get in the end makes a lot of sense.
Trying not to go OT for this thread (or list), run the following on python2 or python3 (http://pastie.org/9958218):
import itertools
class thing(str):
n = itertools.count()
def __new__(cls, *args, **kwds): self = super(thing, cls).__new__(cls, *args, **kwds) self.n = next(self.n) print(' LIFE! %s' % self.n) return self
def __del__(self): print(' WHYY? %s' % self.n)
print('---------------( via constructor )') # bug? literal does something different than constructor?
Why should they do the same thing? One is taking an iterable, and it has to process the elements in iteration order; the other is taking whatever the compiler found convenient to store, in whatever order it chose. If you want to see how CPython in particular compiles the literal, the dis module will show you: it pushes each element on the stack from left to right, then it does a BUILD_SET with the count. You can find the code for BUILD_SET in the main loop in ceval.c, but basically it just pop each element and adds it; since they're in reverse order on the stack, the rightmost one gets added first. (If you're wondering how BUILD_LIST avoids building lists backward, it counts down from the size, inserting each element at the count's index. So, it builds the list backward out of backward elements.)
set_of_things = set([thing('hi'), thing('hi'), thing('hi')])
# reset so it's easy to see the difference thing.n = itertools.count()
print('---------------( via literal )') # use a different identifier else GC on overwrite unused_set_of_things = {thing('hi'), thing('hi'), thing('hi')}
print('---------------( adding another )') set_of_things.add(thing('hi'))
print('---------------( done )')
you will see something like this:
---------------( via constructor ) LIFE! 0 LIFE! 1 LIFE! 2 WHYY? 2 WHYY? 1 ---------------( via literal ) LIFE! 0 LIFE! 1 LIFE! 2 WHYY? 1 WHYY? 0 ---------------( adding another ) LIFE! 3 WHYY? 3 ---------------( done ) WHYY? 0 WHYY? 2
as shown, adding to a set discards the *new* object, not the old (and as an aside, seems to be a little buggy during construction, else the final 2 would have the same id!)... should this happen?
I'm not versed enough in the math behind it to know if it's expected or not, but as it stands, to remain compatible with sets, `d1 | d2` should behave like it does in my code (prefer the first, not the last). I kinda like this, because it makes dict.__or__ a *companion* to .update(), not a replacement (since update prefers the last).
--
C Anthony _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Feb 17, 2015 at 2:22 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
The premise taken here is that dicts ARE sets that simply happen to have values associated with them... hence all operations are against keys ONLY. The fact that values tag along is irrelevant.
but dicts DO have values, and that's their entire reason for existence -- if the values were irrelevant, you'd use a set.... And because values are important -- there is no such thing as "works the same as a set". For instance: # union (first key winning here... IIRC this is how sets actually work)
d1 | d2 {'a': 1, 'b': 2, 'c': 3, 'd': 5, 'e': 6}
The fact that sets may, under the hood, keep the first key is an implementation detail -- by definition the first and second duplicate keys are the same. So there is no guidance here whatsoever as to what to do with with unioning dicts. Oh, except .update() provides a precedent that has proven to be useful. Keeping the second value sure feels more natural to me. Oh, and I'd still prefer + I don't think most users think of merging two dicts together as a boolean logical operation.... All of the operations support arbitrary iterables as the RHS! This is NICE.
not so sure about that -- again, dicts have values, that's why we use them. Maybe defaultdict could work this way, though. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Feb 12, 2015 at 6:13 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I certainly wouldn't want to write
new_dict = a + b + c + d
and have O(N**2) performance when I can do this instead
It would likely not be O(N), but O(N**2) seems exaggerated. Besides, the most common cases would be: new_dict = a + b old_dict += a Both of which can be had in O(N).
new_dict = {} for old_dict in (a, b, c, d): new_dict.update(old_dict)
It would be easier if we could do: new_dict = {} new_dict.update(a, b, c, d) It would also be useful if dict.update() returned self, so this would be valid: new_dict = {}.update(a, b, c, d) If that was so, then '+' could perhaps be implemented in terms of update() with the help of some smarts from the parser. Cheers, -- Juancarlo *Añez*
I don't see ChainMap mentioned in this thread, so I'll fix that: On Thu, Feb 12, 2015 at 2:32 PM, Juancarlo Añez <apalala@gmail.com> wrote:
most common cases would be:
new_dict = a + b old_dict += a
In today's Python, that's: from collections import ChainMap new_dict = dict(ChainMap(b, a)) old_dict.update(a)
new_dict = {} for old_dict in (a, b, c, d): new_dict.update(old_dict)
It would be easier if we could do:
new_dict = {} new_dict.update(a, b, c, d)
It would also be useful if dict.update() returned self, so this would be valid:
new_dict = {}.update(a, b, c, d)
In today's Python: new_dict = dict(ChainMap(d, b, c, a)) Many uses don't need the dict() call – e.g. when passing it **kwargs, or when it's more useful as a view. Personally, the lack of a special operator for this has never bothered me.
On 12/02/2015 14:52, Petr Viktorin wrote:
I don't see ChainMap mentioned in this thread, so I'll fix that:
On Thu, Feb 12, 2015 at 2:32 PM, Juancarlo Añez <apalala@gmail.com> wrote:
[snip]
It would also be useful if dict.update() returned self, so this would be valid:
new_dict = {}.update(a, b, c, d)
In today's Python: new_dict = dict(ChainMap(d, b, c, a))
Many uses don't need the dict() call – e.g. when passing it **kwargs, or when it's more useful as a view.
Personally, the lack of a special operator for this has never bothered me. _______________________________________________
But perhaps the fact that you have, as far as I can see, transposed b and c indicates that this is not the most user-friendly API. :-) Rob Cliffe
Can't "|" return a view? On Thursday, February 12, 2015 at 9:53:12 AM UTC-5, Petr Viktorin wrote:
I don't see ChainMap mentioned in this thread, so I'll fix that:
On Thu, Feb 12, 2015 at 2:32 PM, Juancarlo Añez <apa...@gmail.com <javascript:>> wrote:
most common cases would be:
new_dict = a + b old_dict += a
In today's Python, that's: from collections import ChainMap new_dict = dict(ChainMap(b, a)) old_dict.update(a)
new_dict = {} for old_dict in (a, b, c, d): new_dict.update(old_dict)
It would be easier if we could do:
new_dict = {} new_dict.update(a, b, c, d)
It would also be useful if dict.update() returned self, so this would be valid:
new_dict = {}.update(a, b, c, d)
In today's Python: new_dict = dict(ChainMap(d, b, c, a))
Many uses don't need the dict() call – e.g. when passing it **kwargs, or when it's more useful as a view.
Personally, the lack of a special operator for this has never bothered me. _______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Feb 12, 2015 at 09:02:04AM -0430, Juancarlo Añez wrote:
On Thu, Feb 12, 2015 at 6:13 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I certainly wouldn't want to write
new_dict = a + b + c + d
and have O(N**2) performance when I can do this instead
It would likely not be O(N), but O(N**2) seems exaggerated.
If you want to be pedantic, it's not exactly quadratic behaviour. But it would be about the same as as list and tuple repeated addition, which is also not exactly quadratic behaviour, but we often describe it as such. Repeated list addition is *much* slower than O(N): py> from timeit import Timer py> t = Timer("sum(([1] for i in range(100)), [])") # add 100 lists py> min(t.repeat(number=100)) 0.010571401566267014 py> t = Timer("sum(([1] for i in range(1000)), [])") # 10 times more py> min(t.repeat(number=100)) 0.4032209049910307 So increasing the number of lists being added by a factor of 10 leads to a factor of 40 increase in time taken. Increase it by a factor of 10 again: py> t = Timer("sum(([1] for i in range(10000)), [])") # 10 times more py> min(t.repeat(number=100)) # note the smaller number of trials 34.258460350334644 and we now have the time goes up by a factor not of 10, not of 40, but about 85. So the slowdown is getting worse. It's technically not as bad as quadratic behaviour, but its bad enough to justify the term. Will repeated dict addition be like this? Since dicts are mutable, we can't use the string concatenation trick to optimize each operation, so I expect so: + will have to copy each of its arguments into a new dict, then the next + will copy its arguments into a new dict, and so on. Exactly what happens with list. -- Steve
What if we create a view (ChainMap) to be the result of the | operator? Or would that not work? On Thursday, February 12, 2015 at 12:06:14 PM UTC-5, Steven D'Aprano wrote:
On Thu, Feb 12, 2015 at 09:02:04AM -0430, Juancarlo Añez wrote:
On Thu, Feb 12, 2015 at 6:13 AM, Steven D'Aprano <st...@pearwood.info <javascript:>> wrote:
I certainly wouldn't want to write
new_dict = a + b + c + d
and have O(N**2) performance when I can do this instead
It would likely not be O(N), but O(N**2) seems exaggerated.
If you want to be pedantic, it's not exactly quadratic behaviour. But it would be about the same as as list and tuple repeated addition, which is also not exactly quadratic behaviour, but we often describe it as such. Repeated list addition is *much* slower than O(N):
py> from timeit import Timer py> t = Timer("sum(([1] for i in range(100)), [])") # add 100 lists py> min(t.repeat(number=100)) 0.010571401566267014 py> t = Timer("sum(([1] for i in range(1000)), [])") # 10 times more py> min(t.repeat(number=100)) 0.4032209049910307
So increasing the number of lists being added by a factor of 10 leads to a factor of 40 increase in time taken. Increase it by a factor of 10 again:
py> t = Timer("sum(([1] for i in range(10000)), [])") # 10 times more py> min(t.repeat(number=100)) # note the smaller number of trials 34.258460350334644
and we now have the time goes up by a factor not of 10, not of 40, but about 85. So the slowdown is getting worse. It's technically not as bad as quadratic behaviour, but its bad enough to justify the term.
Will repeated dict addition be like this? Since dicts are mutable, we can't use the string concatenation trick to optimize each operation, so I expect so: + will have to copy each of its arguments into a new dict, then the next + will copy its arguments into a new dict, and so on. Exactly what happens with list.
-- Steve _______________________________________________ Python-ideas mailing list Python...@python.org <javascript:> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Feb 12, 2015, at 05:43, Steven D'Aprano wrote:
I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists.
It's also an attractive nuisance - most cases I can think of for it would be better served by a view.
On Thu, Feb 12, 2015 at 5:43 AM, Steven D'Aprano <steve@pearwood.info> wrote:
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above.
Another strong -1 from me. In my view, the only operation on dictionaries that would deserve to be denoted + would be counter or sparse array addition, but we already have collections.Counter and writing a sparse array (as in sa(a=1,c=-2) + sa(a=1,b=1,c=1) == sa(a=2,b=1,c=-1)) is a simple exercise.
On 02/12/2015 03:43 AM, Steven D'Aprano wrote:
I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists.
I think the level of interest in http://stackoverflow.com/questions/38987/how-can-i-merge-two-python-dictiona... (almost 1000 upvotes on the question alone) does indicate that the desire for an expression form of merging dictionaries is not purely theoretical. Carl
Sure, it's something that people want to do. But then the question becomes "How do you want to merge them?" and then the answers to that are all over the board, to the point that there's no one obvious way to do it.
On Feb 12, 2015, at 1:00 PM, Mark Young <marky1991@gmail.com> wrote:
Sure, it's something that people want to do. But then the question becomes "How do you want to merge them?" and then the answers to that are all over the board, to the point that there's no one obvious way to do it.
Honestly I don’t really think it is all that over the board. Basically in every method of merging that plain dictionaries currently offer it’s a “last use of the key wins”. The only place this isn’t the case is specialized dictionary like classes like Counter. I think using anything other than “last use of the key wins” would be inconsistent with the rest of Python and I think it’s honestly the only thing that can work generically too. See:
a = {1: True, 2: True} b = {2: False, 3: False} dict(list(a.items()) + list(b.items())) {1: True, 2: False, 3: False}
And
a = {1: True, 2: True} b = {2: False, 3: False} c = a.copy() c.update(b) c {1: True, 2: False, 3: False}
And
{1: True, 2: True, 2: False, 3: False} {1: True, 2: False, 3: False}
And
a = {"a": True, "b": True} b = {"b": False, "c": False} dict(a, **b) {'b': False, 'c': False, 'a': True}
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
On Thu, Feb 12, 2015 at 09:57:46AM -0700, Carl Meyer wrote:
On 02/12/2015 03:43 AM, Steven D'Aprano wrote:
I think this is a feature that is more useful in theory than in practice. Which we already have a way to do a merge in place, and a copy-and-merge seems like it should be useful but I'm struggling to think of any use-cases for it. I've never needed this, and I've never seen anyone ask how to do this on the tutor or python-list mailing lists.
I think the level of interest in http://stackoverflow.com/questions/38987/how-can-i-merge-two-python-dictiona... (almost 1000 upvotes on the question alone) does indicate that the desire for an expression form of merging dictionaries is not purely theoretical.
I'm going to quote Raymond Hettinger, from a recent discussion here: [quote] I wouldn't read too much in the point score on the StackOverflow question. First, the question is very old [...] Second, StackOverflow tends to award high scores to the simplest questions and answers [...] A high score indicates interest but not a need to change the language. A much better indicator would be the frequent appearance of ordered sets in real-world code or high download statistics from PyPI or ActiveState's ASPN cookbook. [end quote] In this case, we're not talking about ordered sets. We're talking about something which is may be as little as two lines of code, and can even be squeezed into a single-line function: def merge(a, b): d = a.copy(); d.update(b); return d (but don't do that). StackOverflow's model is designed to encourage people to vote on questions as much as possible, and naturally people tend to vote more for simple questions that they understand rather than hard questions that require a lot of in-depth knowledge or analysis. I agree with Raymond that high votes are not in and of themselves evidence of frequent need. -- Steve
On 12 February 2015 at 18:43, Steven D'Aprano <steve@pearwood.info> wrote:
I'm going to quote Raymond Hettinger, from a recent discussion here:
[quote] I wouldn't read too much in the point score on the StackOverflow question. First, the question is very old [...] Second, StackOverflow tends to award high scores to the simplest questions and answers [...] A high score indicates interest but not a need to change the language. A much better indicator would be the frequent appearance of ordered sets in real-world code or high download statistics from PyPI or ActiveState's ASPN cookbook. [end quote]
I'm surprised that this hasn't been mentioned yet in this thread, but why not create a function, put it on PyPI and collect usage stats from your users? If they are high enough, a case is made. If no-one downloads it because writing your own is easier than adding a dependency, that probably applies to adding it to the core as well (the "new dependency" then would be not supporting Python <3.5 (or whatever)). Converting a 3rd party function to a method on the dict class isn't particularly hard, so I'm not inclined to buy the idea that people won't use it *purely* because it's a function rather than a method, which is the only thing a 3rd party package can't do. I don't think it's a good use for an operator (neither + nor | seem particularly obvious fits to me, the set union analogy notwithstanding). Paul
On 12 February 2015 at 11:14, Paul Moore <p.f.moore@gmail.com> wrote:
(the "new dependency" then would be not supporting Python <3.5 (or whatever))
I've seen this argument a few times on python-ideas, and I don't understand it. By this rationale, there's little point ever adding any new feature to Python, because people who need to support older versions can't use it. By posting on python-ideas, you're implicitly prepared to take the long term view - these are ideas that we might be able to really use in a few years. One day, relying on Python 3.5 or above will be completely reasonable, and then we can take advantage of the things we're adding to it now. It's like an investment in code.
why not create a function, put it on PyPI and collect usage stats
Does anyone seriously believe that people will add a dependency which contains a single three line function? Thomas
On 12 February 2015 at 19:22, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On 12 February 2015 at 11:14, Paul Moore <p.f.moore@gmail.com> wrote:
(the "new dependency" then would be not supporting Python <3.5 (or whatever))
I've seen this argument a few times on python-ideas, and I don't understand it. By this rationale, there's little point ever adding any new feature to Python, because people who need to support older versions can't use it.
To me, it means that anything that gets added to Python should be significant enough to be worth waiting for.
By posting on python-ideas, you're implicitly prepared to take the long term view - these are ideas that we might be able to really use in a few years. One day, relying on Python 3.5 or above will be completely reasonable, and then we can take advantage of the things we're adding to it now. It's like an investment in code.
That's a fair point. You do make me think - when looking at new features, part of the long term view is to look at making sure a proposal is clean and well thought through, so that it's something people will be looking forward to, not merely something "good enough" to provide a feature. Evaluating the various proposals we've seen on that basis (and I concede this is purely my personal feeling): 1. + and += on dictionaries. Looking far enough into the future, I can see a time when people will see these as obvious and natural analogues of list and string concatenation. There's a shorter-term problem with the transition, particularly the fact that a lot of people currently don't find the "last addition wins" behavior obvious, and the performance characteristics of repeated addition (with copying). I could see dict1 + dict2 being seen as a badly-performing anti-pattern, much like string +, perfectly OK for simple cases but don't use it if performance matters. And performance is more likely to matter with dicts than strings. 2. | and |=. I can't imagine ever liking these. I don't like them for sets, either. Personal opinion certainly. 3. A dict method. Not sure. It depends strongly on the name chosen. And although it's not a strict rule (sets in particular break it) I tend to think of methods as being mutating, not creating new copies. 4. An updated() builtin. Feels clean and has a nice parallel with sorted(). But it's a common word that is often used as a variable, and I don't think (long term view again!) that a pattern of proliferating builtins as non-mutating versions of methods is a good one to encourage.
why not create a function, put it on PyPI and collect usage stats
Does anyone seriously believe that people will add a dependency which contains a single three line function?
I'm sorry, that was a snarky suggestion. (Actually my whole post was pretty snarky. Hopefully, this post is fairer.) I do think this is a reasonable situation to invoke the principle that not every 3-line code snippet deserves to be a builtin. Steven posted a pretty simple updated() function that projects can use, and I don't really understand why people can sometimes be so reluctant to include such utilities in their code. Anyway, overall I still remain unconvinced that this is worth adding, but hopefully the above explains my thinking better. Sorry again for the tone of my previous response. Paul
On Thu, Feb 12, 2015 at 12:15 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 12 February 2015 at 19:22, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
On 12 February 2015 at 11:14, Paul Moore <p.f.moore@gmail.com> wrote:
(the "new dependency" then would be not supporting Python <3.5 (or whatever))
I've seen this argument a few times on python-ideas, and I don't understand it. By this rationale, there's little point ever adding any new feature to Python, because people who need to support older versions can't use it.
To me, it means that anything that gets added to Python should be significant enough to be worth waiting for.
By posting on python-ideas, you're implicitly prepared to take the long term view - these are ideas that we might be able to really use in a few years. One day, relying on Python 3.5 or above will be completely reasonable, and then we can take advantage of the things we're adding to it now. It's like an investment in code.
That's a fair point. You do make me think - when looking at new features, part of the long term view is to look at making sure a proposal is clean and well thought through, so that it's something people will be looking forward to, not merely something "good enough" to provide a feature.
Evaluating the various proposals we've seen on that basis (and I concede this is purely my personal feeling):
1. + and += on dictionaries. Looking far enough into the future, I can see a time when people will see these as obvious and natural analogues of list and string concatenation. There's a shorter-term problem with the transition, particularly the fact that a lot of people currently don't find the "last addition wins" behavior obvious, and the performance characteristics of repeated addition (with copying). I could see dict1 + dict2 being seen as a badly-performing anti-pattern, much like string +, perfectly OK for simple cases but don't use it if performance matters. And performance is more likely to matter with dicts than strings.
2. | and |=. I can't imagine ever liking these. I don't like them for sets, either. Personal opinion certainly.
3. A dict method. Not sure. It depends strongly on the name chosen. And although it's not a strict rule (sets in particular break it) I tend to think of methods as being mutating, not creating new copies.
4. An updated() builtin. Feels clean and has a nice parallel with sorted(). But it's a common word that is often used as a variable, and I don't think (long term view again!) that a pattern of proliferating builtins as non-mutating versions of methods is a good one to encourage.
A reminder about PEP 448: Additional Unpacking Generalizations ( https://www.python.org/dev/peps/pep-0448/), which claims that "it vastly simplifies types of 'addition' such as combining dictionaries, and does so in an unambiguous and well-defined way". The spelling for combining two dicts would be: {**d1, **d2} Cheers, Nathan
why not create a function, put it on PyPI and collect usage stats
Does anyone seriously believe that people will add a dependency which contains a single three line function?
I'm sorry, that was a snarky suggestion. (Actually my whole post was pretty snarky. Hopefully, this post is fairer.)
I do think this is a reasonable situation to invoke the principle that not every 3-line code snippet deserves to be a builtin. Steven posted a pretty simple updated() function that projects can use, and I don't really understand why people can sometimes be so reluctant to include such utilities in their code.
Anyway, overall I still remain unconvinced that this is worth adding, but hopefully the above explains my thinking better.
Sorry again for the tone of my previous response. Paul _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thursday, February 12, 2015 1:33 PM, Nathan Schneider <neatnate@gmail.com> wrote:
A reminder about PEP 448: Additional Unpacking Generalizations (https://www.python.org/dev/peps/pep-0448/), which claims that "it vastly simplifies types of 'addition' such as combining dictionaries, and does so in an unambiguous and well-defined way". The spelling for combining two dicts would be: {**d1, **d2}
I like that this makes all of the bikeshedding questions obvious: it's a dict display, so the result is clearly a dict rather than a type(d1), it's clearly going to follow the same ordering rules for duplicate keys as a dict display (d2 beats d1), and so on. And it's nice that it's just a special case of a more general improvement. However, it is more verbose (and more full of symbols), and it doesn't give you an obvious way to, say, merge two OrderedDicts into an OrderedDict. Also, code that adds non-literal dicts together can be backported just by adding a trivial dict subclass, but code that uses PEP 448 can't be.
On Fri, Feb 13, 2015 at 12:06:44AM +0000, Andrew Barnert wrote:
On Thursday, February 12, 2015 1:33 PM, Nathan Schneider <neatnate@gmail.com> wrote:
A reminder about PEP 448: Additional Unpacking Generalizations (https://www.python.org/dev/peps/pep-0448/), which claims that "it vastly simplifies types of 'addition' such as combining dictionaries, and does so in an unambiguous and well-defined way". The spelling for combining two dicts would be: {**d1, **d2}
Very nice! That should be extensible to multiple arguments without the pathologically slow performance of repeated addition: {**d1, **d2, **d3, **d4} and you can select which dict you want to win in the event of clashes by changing the order.
I like that this makes all of the bikeshedding questions obvious: it's a dict display, so the result is clearly a dict rather than a type(d1), it's clearly going to follow the same ordering rules for duplicate keys as a dict display (d2 beats d1), and so on.
And it's nice that it's just a special case of a more general improvement.
However, it is more verbose (and more full of symbols), and it doesn't give you an obvious way to, say, merge two OrderedDicts into an OrderedDict.
Here's a more general proposal. The dict constructor currently takes either a single mapping or a single iterable of (key,value) pairs. The update method takes a single mapping, or a single iterable of (k,v) pairs, AND any arbitrary keyword arguments. We should generalise both of these to take *one or more* mappings and/or iterables, and solve the most common forms of copy-and-update. That avoids the (likely) pathologically slow behaviour of repeated addition, avoids any confusion over operators and having += duplicating the update method. Then, merging in place uses: d1.update(d2, d3, d4, d5) and copy-and-merge uses: dict(d1, d2, d3, d4, d5) # like d1+d2+d3+d4+d5 where the d's can be any mapping, and you can optionally include keyword arguments as well. You don't have to use dict, you can use any Mapping which uses the same constructor semantics. That means subclasses of dict ought to work (unless the subclass does something silly) and even OrderedDict ought to work (modulo the fact that regular dicts and keyword args are unordered). Here's a proof of concept: class Dict(dict): def __init__(self, *mappings, **kwargs): assert self == {} Dict.update(self, *mappings, **kwargs) def update(self, *mappings, **kwargs): for mapping in (mappings + (kwargs,)): if hasattr(mapping, 'keys'): for k in mapping: self[k] = mapping[k] else: for (k,v) in mapping: self[k] = v -- Steve
On Feb 12, 2015, at 18:24, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Feb 13, 2015 at 12:06:44AM +0000, Andrew Barnert wrote:
On Thursday, February 12, 2015 1:33 PM, Nathan Schneider <neatnate@gmail.com> wrote:
A reminder about PEP 448: Additional Unpacking Generalizations (https://www.python.org/dev/peps/pep-0448/), which claims that "it vastly simplifies types of 'addition' such as combining dictionaries, and does so in an unambiguous and well-defined way". The spelling for combining two dicts would be: {**d1, **d2}
Very nice! That should be extensible to multiple arguments without the pathologically slow performance of repeated addition:
{**d1, **d2, **d3, **d4}
and you can select which dict you want to win in the event of clashes by changing the order.
I like that this makes all of the bikeshedding questions obvious: it's a dict display, so the result is clearly a dict rather than a type(d1), it's clearly going to follow the same ordering rules for duplicate keys as a dict display (d2 beats d1), and so on.
And it's nice that it's just a special case of a more general improvement.
However, it is more verbose (and more full of symbols), and it doesn't give you an obvious way to, say, merge two OrderedDicts into an OrderedDict.
Here's a more general proposal. The dict constructor currently takes either a single mapping or a single iterable of (key,value) pairs. The update method takes a single mapping, or a single iterable of (k,v) pairs, AND any arbitrary keyword arguments.
The constructor also takes any arbitrary keyword items. (In fact, if you look at the source, dict_init, just calls the same dict_update_common function that update calls. So it's as if you defined __init__(self, *args, **kwargs) and update(self, *args, **kwargs) to both just call self.__update_common(*args, **kwargs).)
We should generalise both of these to take *one or more* mappings and/or iterables, and solve the most common forms of copy-and-update.
I like this idea, but it's not perfect--see below.
That avoids the (likely) pathologically slow behaviour of repeated addition, avoids any confusion over operators and having += duplicating the update method.
Then, merging in place uses:
d1.update(d2, d3, d4, d5)
and copy-and-merge uses:
dict(d1, d2, d3, d4, d5) # like d1+d2+d3+d4+d5
where the d's can be any mapping, and you can optionally include keyword arguments as well.
You don't have to use dict, you can use any Mapping which uses the same constructor semantics.
Except that every existing Mapping that anyone has developed so far doesn't use the same constructor semantics, and very few of them will automatically change to do so just because dict does. Basically, unless it subclasses or delegates to dict and either doesn't override __init__, or does so with *args, **kwargs and passes those along untouched, it won't work. And that's not very common. For example, despite what you say below, OrderedDict (as currently written) will not work; it never passes the __init__ arguments to dict. Or UserDict; it passes them to MutableMapping instead, and only after checking that there's only 0 or 1 positional arguments. Or any of three third-party sorted dicts I looked at. I think defaultdict actually will work (its __init__ slot pulls the first argument, then passes the rest to dict blindly, and its update is inherited from dict), as will the Python-equivalent recipe (which I'd guess is available on PyPI as a backport for 2.4, but I haven't looked). So, this definitely is not as general a solution as adding a new method to dict and Mapping would be; it will only help dict, a handful of other classes. Still, I like this idea. It defines a nice idiom for dict-like constructors to follow--and classes written to that idiom will actually work out of the box in 3.4, and even 2.7. That's pretty cool. And for people who want to use this with plain old dicts in 2.7 and 3.4, all they need to do is use your trivial dict subclass. (Of course that still won't let them call update with multiple args on a dict created as a literal or passed in from third-party code, but it will let them do the copy-and-update the same way as in 3.5, and I think that's more important than the update-with-multiple-args.) It might be worth changing MutableMapping.update to handle the new signature, but that raises the question of whether that's a backward-incompatible change. And, because not all MutableMapping classes delegate __init__ to update the way OrderedDict does, it still doesn't solve every type (or even most types).
That means subclasses of dict ought to work (unless the subclass does something silly) and even OrderedDict ought to work (modulo the fact that regular dicts and keyword args are unordered).
I suppose it depends what you mean by "something silly", but I think it's pretty common for subclasses to define their own __init__, and sometimes update, and not pass the args through blindly. More importantly, I don't think dict subclasses are nearly as common as mapping-protocol classes that delegate to or don't even touch a dict (whether subclasses of Mapping or not), so I don't think it would matter all that much if dict subclasses worked. And again, OrderedDict will not work.
Here's a proof of concept:
class Dict(dict): def __init__(self, *mappings, **kwargs): assert self == {} Dict.update(self, *mappings, **kwargs)
If you want to ensure that overriding update doesn't affect __init__ (I'm not _sure_ that you do--yes, UserDict does so, but UserDict is trying to make sure it has exactly the same behavior as dict when subclassed, not trying to improve the behavior of dict), why not have both __init__ and update delegate to a private __update method (as UserDict does)? I'm not sure it makes much difference, but it seems like a more explicit way of saying "this will not use overridden behavior".
def update(self, *mappings, **kwargs): for mapping in (mappings + (kwargs,)): if hasattr(mapping, 'keys'): for k in mapping: self[k] = mapping[k] else: for (k,v) in mapping: self[k] = v
The current dict constructor/update function iterates over mapping.keys(), not directly over mapping; that means it work with "hybrid" types that iterate like sequences (or something else) but also have keys/values/items. And of course it has special fast-path code for dealing with real dict objects. (Also see MutableMapping.update; it also iterates over mapping.keys() if hasattr(mapping, 'keys'), and it also has a fast-path iterate over mapping only for really with real Mapping classes.) I think this would be simpler, more correct, and also probably much more efficient, for a pure-Python wrapper class: def update(self, *mappings, **kwargs): for mapping in (mappings + (kwargs,)): dict.update(self, mapping) And for the real C definition, you're basically just moving the guts of dict_update_common (which are trivial) into a loop over the args instead of an if over a single arg.
avoids any confusion over operators and having += duplicating the update method.
+= duplicates the extend method on lists. And it's really redundant for numbers, too: x += y x = x + y So plenty of precedent. And my experience with newbies (been teaching intro to python for a few years) is that they grab onto + for concatenating strings really quickly. And I have a hard time getting them to use other methods for building up strings. Dicts in general come later, but are key to python. But I don't know that newbies expect + to work for dicts or not -- updating a dict is simply a lot less common. -Chris
On Thu, Feb 12, 2015 at 07:43:36PM -0800, Chris Barker - NOAA Federal wrote:
avoids any confusion over operators and having += duplicating the update method.
+= duplicates the extend method on lists.
Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist. It also leads to a quite ugly and unfortunate language wart with tuples: py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None) Try explaining to novices why this is not a bug.
And it's really redundant for numbers, too:
x += y
x = x + y
Blame C for that. C defines so many ways to do more or less the same thing (x++ ++x x+=1 x=x+1) that a generation or three of programmers have come to consider it normal.
So plenty of precedent.
Historical accident and backwards compatibility require that certain, hmmm, "dirty" is too strong a word, let's say slightly tarnished, design choices have to persist forever, or near enough to forever, but that's not a good reason for propagating the same choices into new functionality. It is *unfortunate* that += works with lists and tuples because + works, not a feature to emulate. Python made the best of a bad deal with augmented assignments: a syntax which works fine in C doesn't *quite* work cleanly in Python, but demand for it lead to it being supported. The consequence is that every generation of Python programmers now need to learn for themselves that += on non-numeric types has surprising corner cases. Usually the hard way.
And my experience with newbies (been teaching intro to python for a few years) is that they grab onto + for concatenating strings really quickly. And I have a hard time getting them to use other methods for building up strings.
Right. And that is a *bad thing*. They shouldn't be using + for concatenation except for the simplest cases. -- Steve
On Feb 12, 2015, at 10:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:
+= duplicates the extend method on lists.
Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist.
Actually, that's the primary motivator for += and friends -- to support in-place operations on mutables. Notably numpy arrays.
It also leads to a quite ugly and unfortunate language wart with tuples:
py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None)
Try explaining to novices why this is not a bug.
I'm going to have to think about that a fair bit myself -- so yes, really confusing. But the "problem" here is that augmented assignment shouldn't work on immutables at all. But then we wouldn't have the too appealing to resist syntactic sugar for integer incrementing. But are you saying that augmented assignment was simply a mistake altogether, and therefore no new use of it should be added at all ( and you'd deprecate it if you could)? In which case, there's nothing to discuss about this case. -Chris
On Feb 13, 2015, at 8:02, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
On Feb 12, 2015, at 10:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:
+= duplicates the extend method on lists.
Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist.
Actually, that's the primary motivator for += and friends -- to support in-place operations on mutables. Notably numpy arrays.
It also leads to a quite ugly and unfortunate language wart with tuples:
py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None)
Try explaining to novices why this is not a bug.
I'm going to have to think about that a fair bit myself -- so yes, really confusing.
But the "problem" here is that augmented assignment shouldn't work on immutables at all.
The problem is that the way augmented assignment works is to first treat the target as an expression, then call __iadd__ on the result, then assign the result of that back to the target. So your "t[0] == [1]" turns into, in effect, "setitem(t, 0, getitem(t, 0).__iadd__([1]))". Once you see it that way, it's obvious why it works the way it does. And the official FAQ explains this, but it still seems to surprise plenty of people who aren't at even close to novices. (But that's probably more relevant to the other thread on namespace unpacking than to this thread.)
But the "problem" here is that augmented assignment shouldn't work on immutables at all.
The problem is that the way augmented assignment works is to first treat the target as an expression, then call __iadd__ on the result, then assign the result of that back to the target.
Right - I figured that out, and found the FAQ. But why does it work that way? Because it needs to work on immutable objects. If it didn't, then you wouldn't need the "assign back to the original name" step. Let's look at __iadd__: If it's in-place for a mutable object, it needs to return self. But the python standard practice is that methods that mutate objects shouldn't return self ( like list.sort() ) for instance. But if you want the augmented assignment operators to support immutables, there needs to be a way to get a new object back -- thus the bind, and the confusing, limiting behavior. But it's what we've got. Is there a lesson here for the additional unpacking being proposed? I have no idea. -Chris - CHB
Chris Barker - NOAA Federal wrote:
But why does it work that way? Because it needs to work on immutable objects. If it didn't, then you wouldn't need the "assign back to the original name" step.
This doesn't make it wrong for in-place operators to work on immutable objects. There are two distinct use cases: 1) You want to update a mutable object in-place. 2) The LHS is a complex expression that you only want to write out and evaluate once. Case (2) applies equally well to mutable and immutable objects. There are ways that the tuple problem could be fixed, such as skipping the assignment if __iadd__ returns the same object. But that would be a backwards-incompatible change, since there could be code that relies on the assignment always happening.
If it's in-place for a mutable object, it needs to return self. But the python standard practice is that methods that mutate objects shouldn't return self ( like list.sort() ) for instance.
The reason for that is to catch the mistake of using a mutating method when you meant to use a non-mutating one. That doesn't apply to __iadd__, because you don't usually call it yourself. -- Greg
On Feb 13, 2015, at 17:19, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Chris Barker - NOAA Federal wrote:
But why does it work that way? Because it needs to work on immutable objects. If it didn't, then you wouldn't need the "assign back to the original name" step.
This doesn't make it wrong for in-place operators to work on immutable objects. There are two distinct use cases:
1) You want to update a mutable object in-place.
2) The LHS is a complex expression that you only want to write out and evaluate once.
Case (2) applies equally well to mutable and immutable objects.
There are ways that the tuple problem could be fixed, such as skipping the assignment if __iadd__ returns the same object. But that would be a backwards-incompatible change, since there could be code that relies on the assignment always happening.
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, or move the element in sorted order, or call a trace-on-change callback, etc. This change would mean that augmented assignment isn't really assignment at all. (And if you don't _want_ assignment, you can always call t[0].extend([1]) instead of t[0] += [1].)
If it's in-place for a mutable object, it needs to return self. But the python standard practice is that methods that mutate objects shouldn't return self ( like list.sort() ) for instance.
The reason for that is to catch the mistake of using a mutating method when you meant to use a non-mutating one. That doesn't apply to __iadd__, because you don't usually call it yourself.
I think there's another big distinction: a method call is an expression, not a statement, so if it returned self, you could chain multiple mutations into a single statement. Because it returns None, you get only one mutation per statement. And it's the leftmost thing in the statement that's being mutated, except in uncommon or pathological code. Unlike many other languages, assignment (including augmented assignment) is a statement in Python, so that isn't an issue. So you still only get one mutation per statement, and to the leftmost thing, when using augmented assignment. (And, as you say, you usually don't call __iadd__ itself; if you really wanted to call it explicitly and chain calls together, you could, but that falls under pathological code.)
Andrew Barnert wrote:
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, ...
That's what I was thinking. But I'm not sure it would be a good design, since it would *only* catch mutations made through in-place operators. You would need some other mechanism for detecting mutations of sub-objects made in other ways, and whatever mechanism you used for that would probably catch in-place operations as well. -- Greg
On Feb 13, 2015, at 18:46, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Andrew Barnert wrote:
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, ...
That's what I was thinking. But I'm not sure it would be a good design,
Now I'm confused. The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad. So, I don't know what the "it" you're suggesting as an alternative is. Not calling setitem/setattr, but instead calling some other method? If that method is just a post-change item_was_modified call I don't think it's sufficient, while if it's a pre-change item_will_be_modified(name, newval) that lets you change or reject the mutation it's equivalent to the existing setitem.
since it would *only* catch mutations made through in-place operators. You would need some other mechanism for detecting mutations of sub-objects made in other ways, and whatever mechanism you used for that would probably catch in-place operations as well.
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 02/13/2015 06:57 PM, Andrew Barnert wrote:
On Feb 13, 2015, at 18:46, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Andrew Barnert wrote:
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, ...
That's what I was thinking. But I'm not sure it would be a good design,
Now I'm confused.
The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad.
--> some_var = ([1], 'abc') --> tmp = some_var[0] --> tmp += [2, 3] --> some_var ([1, 2, 3], 'abc') In that example, 'some_var' is modified without its __setitem__ ever being called. -- ~Ethan~
On Sat, Feb 14, 2015 at 11:12 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad.
--> some_var = ([1], 'abc') --> tmp = some_var[0] --> tmp += [2, 3] --> some_var ([1, 2, 3], 'abc')
In that example, 'some_var' is modified without its __setitem__ ever being called.
not really -- an object in the some_var is modified -- there could be any number of other references to that object -- so this is very much how python works. The fact that you can't directly use augmented assignment on an object contained in an immutable is not a bug, but it certainly is a wart -- particuarly since it will raise an Exception AFTER it has, in fact, performed the operation requested. I have argued that this never would have come up if augmented assignment were only used for in-place operations, but then we couldn't used it on integers, which was apparently desperately wanted ;-) (and the name "augmented assignment" would be a good name, either...). I don't know enough about how this all works under the hood to know if it could be made to work, but it seems the intention is clear here: object[index] += something. is a shorthand for: tmp = object[index] tmp += something or in a specific case: In [66]: t = ([], None) In [67]: t[0].extend([3,4]) In [68]: t Out[68]: ([3, 4], None) Do others agree that this, in fact, has an unambiguous intent? And that it would be nice if it worked? OR am I missing something? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 02/14/2015 11:41 AM, Chris Barker wrote:
On Sat, Feb 14, 2015 at 11:12 AM, Ethan Furman wrote:
On 02/13/2015 06:57 PM, Andrew Barnert wrote:
The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad.
--> some_var = ([1], 'abc') --> tmp = some_var[0] --> tmp += [2, 3] --> some_var ([1, 2, 3], 'abc')
In that example, 'some_var' is modified without its __setitem__ ever being called.
not really -- an object in the some_var is modified -- there could be any number of other references to that object -- so this is very much how python works.
Oops, I misread what Andrew was saying, sorry! -- ~Ethan~
This sub-thread has long since drifted away from dicts, so I've changed the subject to make that clear. On Sat, Feb 14, 2015 at 11:41:52AM -0800, Chris Barker wrote:
The fact that you can't directly use augmented assignment on an object contained in an immutable is not a bug, but it certainly is a wart -- particuarly since it will raise an Exception AFTER it has, in fact, performed the operation requested.
Yes, but you can use augmented assignment on an object contained in an immutable under some circumstances. Here are three examples demonstrating outright failure, weird super-position of failed-but-succeeded, and success. py> t = (1, ) py> t[0] += 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t (1,) py> t = ([1], ) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1, 1],) py> t[0][0] += 1 py> t ([2, 1],)
I have argued that this never would have come up if augmented assignment were only used for in-place operations,
And it would never happen if augmented assignment *never* was used for in-place operations. If it always required an assignment, then if the assignment failed, the object in question would be unchanged. Alas, there's no way to enforce the rule that __iadd__ doesn't modify objects in place, and it actually is a nice optimization when they can do so. [...]
I don't know enough about how this all works under the hood to know if it could be made to work, but it seems the intention is clear here:
object[index] += something.
is a shorthand for:
tmp = object[index] tmp += something
No, that doesn't work. If tmp is *immutable*, then object[index] never gets updated. The intention as I understand it is that: reference += expr should be syntactic sugar (possibly optimized) for: reference = reference + expr where "reference" means (for example) bare names, item references, key references, and chaining the same: n n[0] n[0]['spam'] n[0]['spam'].eggs etc. I wonder if we can make this work more clearly if augmented assignments checked whether the same object is returned and skipped the assignment in that case? -- Steve
On Feb 14, 2015, at 17:30, Steven D'Aprano <steve@pearwood.info> wrote:
This sub-thread has long since drifted away from dicts, so I've changed the subject to make that clear.
On Sat, Feb 14, 2015 at 11:41:52AM -0800, Chris Barker wrote:
The fact that you can't directly use augmented assignment on an object contained in an immutable is not a bug, but it certainly is a wart -- particuarly since it will raise an Exception AFTER it has, in fact, performed the operation requested.
Yes, but you can use augmented assignment on an object contained in an immutable under some circumstances. Here are three examples demonstrating outright failure, weird super-position of failed-but-succeeded, and success.
py> t = (1, ) py> t[0] += 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t (1,)
py> t = ([1], ) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1, 1],)
py> t[0][0] += 1 py> t ([2, 1],)
I have argued that this never would have come up if augmented assignment were only used for in-place operations,
And it would never happen if augmented assignment *never* was used for in-place operations. If it always required an assignment, then if the assignment failed, the object in question would be unchanged.
Alas, there's no way to enforce the rule that __iadd__ doesn't modify objects in place, and it actually is a nice optimization when they can do so.
No, it's not just a nice optimization, it's an important part of the semantics. The whole point of being able to share mutable objects is being able to mutate them in a way that all the sharers see. If __iadd__ didn't modify objects in-place, you'd have this: py> a, b = [], [] py> c = [a, b] py> c[1] += [1] py> b [] Of course this would make perfect sense if Python were a pure immutable language, or a lower-level language like C++ where c = [a, b] made copies of a and b rather than creating new names referring to the same lists, or if Python had made the decision long ago that mutation is always spelled with an explicit method call rather than operator-like syntax. And any of those three alternatives could be part of a good language. But that language would be substantially different from Python.
I don't know enough about how this all works under the hood to know if it could be made to work, but it seems the intention is clear here:
object[index] += something.
is a shorthand for:
tmp = object[index] tmp += something
No, that doesn't work. If tmp is *immutable*, then object[index] never gets updated.
The intention as I understand it is that:
reference += expr
should be syntactic sugar (possibly optimized) for:
reference = reference + expr
where "reference" means (for example) bare names, item references, key references, and chaining the same:
n n[0] n[0]['spam'] n[0]['spam'].eggs
etc.
I wonder if we can make this work more clearly if augmented assignments checked whether the same object is returned and skipped the assignment in that case?
I already answered that earlier. There are plenty of objects that necessarily rely on the assumption that item/attr assignment always means __setitem__/__setattr__. Consider any object with a cache that has to be invalidated when one of its members or attributes is set. Or a sorted list that may have to move an element if it's replaced. Or really almost any case where a custom __setattr__, an @property or custom descriptor, or a non-trivial __setitem__ is useful. All of there would break. What you're essentially proposing is that augmented assignment is no longer really assignment, so classes that want to manage assignment in some way can't manage augmented assignment.
On Sat, Feb 14, 2015 at 07:10:19PM -0800, Andrew Barnert wrote:
On Feb 14, 2015, at 17:30, Steven D'Aprano <steve@pearwood.info> wrote: [snip example of tuple augmented assignment which both succeeds and fails at the same time]
I have argued that this never would have come up if augmented assignment were only used for in-place operations,
And it would never happen if augmented assignment *never* was used for in-place operations. If it always required an assignment, then if the assignment failed, the object in question would be unchanged.
Alas, there's no way to enforce the rule that __iadd__ doesn't modify objects in place, and it actually is a nice optimization when they can do so.
No, it's not just a nice optimization, it's an important part of the semantics. The whole point of being able to share mutable objects is being able to mutate them in a way that all the sharers see.
Sure. I didn't say that it was "just" (i.e. only) a nice optimization. Augmented assignment methods like __iadd__ are not only permitted but encouraged to perform changes in place. As you go on to explain, those semantics are language design choice. Had the Python developers made different choices, then Python would naturally be different. But given the way Python works, you cannot enforce any such "no side-effects" rule for mutable objects.
If __iadd__ didn't modify objects in-place, you'd have this:
py> a, b = [], [] py> c = [a, b] py> c[1] += [1] py> b []
Correct. And that's what happens if you write c[1] = c[1] + [1]. If you wanted to modify the object in place, you could write c[1].extend([1]) or c[1].append(1). The original PEP for this feature: http://legacy.python.org/dev/peps/pep-0203/ lists two rationales: "simplicity of expression, and support for in-place operations". It doesn't go into much detail for the reason *why* in-place operations are desirable, but the one example given (numpy arrays) talks about avoiding needing to create a new object, possibly running out of memory, and then "possibly delete the original object, depending on reference count". To me, saving memory sounds like an optimization :-) But of course you are correct Python would be very different indeed if augmented assignment didn't allow mutation in-place. [...]
I wonder if we can make this work more clearly if augmented assignments checked whether the same object is returned and skipped the assignment in that case?
I already answered that earlier. There are plenty of objects that necessarily rely on the assumption that item/attr assignment always means __setitem__/__setattr__.
Consider any object with a cache that has to be invalidated when one of its members or attributes is set. Or a sorted list that may have to move an element if it's replaced. Or really almost any case where a custom __setattr__, an @property or custom descriptor, or a non-trivial __setitem__ is useful. All of there would break.
Are there many such objects where replacing a member with itself is a meaningful change? E.g. would the average developer reasonably expect that as a deliberate design feature, spam = spam should be *guaranteed* to not be a no-op? I know that for Python today, it may not be a no-op, if spam is an expression such as foo.bar or foo[bar], and I'm not suggesting that it would be reasonable to change the compiler semantics so that "normal" = binding should skip the assignment when both sides refer to the same object. But I wonder whether *augmented assignment* should do so. I don't do this lightly, but only to fix a wart in the language. See below.
What you're essentially proposing is that augmented assignment is no longer really assignment, so classes that want to manage assignment in some way can't manage augmented assignment.
I am suggesting that perhaps we should rethink the idea that augmented assignment is *unconditionally* a form of assignment. We're doing this because the current behaviour breaks under certain circumstances. If it simply raised an exception, that would be okay, but the fact that the operation succeeds and yet still raises an exception, that's pretty bad. In the case of mutable objects inside immutable ones, we know the augmented operation actually doesn't require there to be an assignment, because the mutation succeeds even though the assignment fails. Here's an example again, for anyone skimming the thread or has gotten lost: t = ([1, 2], None) t[0] += [1] The PEP says: The __iadd__ hook should behave similar to __add__, returning the result of the operation (which could be `self') which is to be assigned to the variable `x'. I'm suggesting that if __iadd__ returns self, the assignment be skipped. That would solve the tuple case above. You're saying it might break code that relies on some setter such as __setitem__ or __setattr__ being called. E.g. myobj.spam = [] # spam is a property myobj.spam += [1] # myobj expects this to call spam's setter But that's already broken, because the caller can trivially bypass the setter for any other in-place mutation: myobj.spam.append(1) myobj.spam[3:7] = [1, 2, 3, 4, 5] del myobj.spam[2] myobj.spam.sort() etc. In other words, in the "cache invalidation" case (etc.), no real class that directly exposes a mutable object to the outside world can rely on a setter being called. It would have to wrap it in a proxy to intercept mutator methods, or live with the fact that it won't be notified of mutations. I used to think that Python had no choice but to perform an unconditional assignment, because it couldn't tell whether the operation was a mutation or not. But I think I was wrong. If the result of __iadd__ is self, then either the operation was a mutation, or the assignment is "effectively" a no-op. (That is, the result of the op hasn't changed anything.) I say "effectively" a no-op in scare quotes because setters will currently be called in this situation: myobj.spam = "a" myobj.spam += "" # due to interning, "a" + "" may be the same object Currently that will call spam's setter, and the argument will be the identical object as spam's current value. It may be that the setter is rather naive, and it doesn't bother to check whether the new value is actually different from the old value before performing its cache invalidation (or whatever). So you are right that this change will affect some code. That doesn't mean we can't fix this. It just means we have to go through a transition period, like for any other change to Python's semantics. During the transition, you may need to import from __future__, or there may be a warning, or both. After the transition, writing: myobj.spam = myobj.spam will still call the spam setter, always. But augmented assignment may not, if __iadd__ returns the same object. (Not just an object with the same value, it has to be the actual same object.) I think that's a reasonable change to make, to remove this nasty gotcha from the language. For anyone relying on their cache being invalidated when it is touched, even if the touch otherwise makes no difference, they just have to deal with a slight change in the definition of "touched". Augmented assignment won't work. Instead of using cache.time_to_live += 0 to cause an invalidation, use cache.time_to_live = cache.time_to_live. Or better still, provide an explicit cache.invalidate() method. -- Steve
On Feb 14, 2015, at 21:40, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Feb 14, 2015 at 07:10:19PM -0800, Andrew Barnert wrote:
On Feb 14, 2015, at 17:30, Steven D'Aprano <steve@pearwood.info> wrote: [snip example of tuple augmented assignment which both succeeds and fails at the same time]
I have argued that this never would have come up if augmented assignment were only used for in-place operations,
And it would never happen if augmented assignment *never* was used for in-place operations. If it always required an assignment, then if the assignment failed, the object in question would be unchanged.
Alas, there's no way to enforce the rule that __iadd__ doesn't modify objects in place, and it actually is a nice optimization when they can do so.
No, it's not just a nice optimization, it's an important part of the semantics. The whole point of being able to share mutable objects is being able to mutate them in a way that all the sharers see.
Sure. I didn't say that it was "just" (i.e. only) a nice optimization. Augmented assignment methods like __iadd__ are not only permitted but encouraged to perform changes in place.
As you go on to explain, those semantics are language design choice. Had the Python developers made different choices, then Python would naturally be different. But given the way Python works, you cannot enforce any such "no side-effects" rule for mutable objects.
If __iadd__ didn't modify objects in-place, you'd have this:
py> a, b = [], [] py> c = [a, b] py> c[1] += [1] py> b []
Correct. And that's what happens if you write c[1] = c[1] + [1]. If you wanted to modify the object in place, you could write c[1].extend([1]) or c[1].append(1).
The original PEP for this feature:
http://legacy.python.org/dev/peps/pep-0203/
lists two rationales:
"simplicity of expression, and support for in-place operations". It doesn't go into much detail for the reason *why* in-place operations are desirable, but the one example given (numpy arrays) talks about avoiding needing to create a new object, possibly running out of memory, and then "possibly delete the original object, depending on reference count". To me, saving memory sounds like an optimization :-)
But of course you are correct Python would be very different indeed if augmented assignment didn't allow mutation in-place.
[...]
I wonder if we can make this work more clearly if augmented assignments checked whether the same object is returned and skipped the assignment in that case?
I already answered that earlier. There are plenty of objects that necessarily rely on the assumption that item/attr assignment always means __setitem__/__setattr__.
Consider any object with a cache that has to be invalidated when one of its members or attributes is set. Or a sorted list that may have to move an element if it's replaced. Or really almost any case where a custom __setattr__, an @property or custom descriptor, or a non-trivial __setitem__ is useful. All of there would break.
Are there many such objects where replacing a member with itself is a meaningful change?
Where replacing a member with a mutated version of the member is meaningful, sure. Consider a sorted list again. Currently, it's perfectly valid to do sl[0] += x, because sl gets a chance to move the element, while it's not valid (according to the contract of sortedlist) to call sl[0].extend(x). This is relatively easy to understand and remember, even if it may seem odd to someone who doesn't understand Python assignment under the covers. (I could dig up a few StackOverflow questions where something like "use sl[0] += x" is the accepted answer--although sadly few if any of them explain _why_ there's a difference...) Obviously if we changed += to be the same as update, people could find _different_ answers. For example, you can always write "tmp = sl[0]; tmp.extend(x); sl[0] = tmp" to explicitly reintroduce assignment where we've taken it away. But would you suggest that's an improvement? You could (and do) argue that any class that relies on assignment to maintain sorting, flush caches, update proxies, whatever is a badly-designed class, and this feature has turned out to be an attractive nuisance. (To be honest, the fact that a dict can effectively guarantee that its keys won't change by requiring hashabllity, whole a sorted container has to document "please don't change the keys", is itself a wart on the language--but not one I'd suggest fixing.) But this really is a more significant change than you're making out.
E.g. would the average developer reasonably expect that as a deliberate design feature, spam = spam should be *guaranteed* to not be a no-op? I know that for Python today, it may not be a no-op, if spam is an expression such as foo.bar or foo[bar], and I'm not suggesting that it would be reasonable to change the compiler semantics so that "normal" = binding should skip the assignment when both sides refer to the same object.
But I wonder whether *augmented assignment* should do so. I don't do this lightly, but only to fix a wart in the language. See below.
This is a wart that's been there since the feature was added in the early 2.x days, and most people don't even run into it until they've been using Python for years. (Look at how many experienced Python devs in the previous thread were surprised, because they'd never run into it.) And the workaround is generally trivial: don't use assignment (in the FAQ case, and yours, just call extend instead). And it's easy to understand once you think about the semantics. Creating a much larger, and harder-to-work-around, problem in an unknown number of libraries (and apps that use those libraries) to fix a small wart like this doesn't seem like a good trade. And of course the _real_ answer in almost all cases is even simpler than the workaround: don't try to assign to objects inside immutable containers. Most people already know not to do this in the first place, which is why they rarely if ever run into the case where it half-works--and when they do, either the += was a mistake, or using a tuple instead of a list was a mistake, and the exception is sufficient to show them their mistake. I suppose it's _possible_ someone has written some code that dealt with the exception and then proceeded to work on bad data, but I don't think it's very likely. And that's really the only code that's affected by the wart.
What you're essentially proposing is that augmented assignment is no longer really assignment, so classes that want to manage assignment in some way can't manage augmented assignment.
I am suggesting that perhaps we should rethink the idea that augmented assignment is *unconditionally* a form of assignment. We're doing this because the current behaviour breaks under certain circumstances. If it simply raised an exception, that would be okay, but the fact that the operation succeeds and yet still raises an exception, that's pretty bad.
In the case of mutable objects inside immutable ones, we know the augmented operation actually doesn't require there to be an assignment, because the mutation succeeds even though the assignment fails. Here's an example again, for anyone skimming the thread or has gotten lost:
t = ([1, 2], None) t[0] += [1]
The PEP says:
The __iadd__ hook should behave similar to __add__, returning the result of the operation (which could be `self') which is to be assigned to the variable `x'.
I'm suggesting that if __iadd__ returns self, the assignment be skipped. That would solve the tuple case above. You're saying it might break code that relies on some setter such as __setitem__ or __setattr__ being called. E.g.
myobj.spam = [] # spam is a property myobj.spam += [1] # myobj expects this to call spam's setter
But that's already broken, because the caller can trivially bypass the setter for any other in-place mutation:
myobj.spam.append(1) myobj.spam[3:7] = [1, 2, 3, 4, 5] del myobj.spam[2] myobj.spam.sort()
The fact that somebody can work around your contract doesn't mean that your design is broken. I can usually ttrivially bypass your read-only x property by setting _x instead; so what? And again, I can break just about every mutable sorted container ever written just by mutating the keys; that doesn't mean sorted containers are useless.
etc. In other words, in the "cache invalidation" case (etc.), no real class that directly exposes a mutable object to the outside world can rely on a setter being called. It would have to wrap it in a proxy to intercept mutator methods, or live with the fact that it won't be notified of mutations.
I used to think that Python had no choice but to perform an unconditional assignment, because it couldn't tell whether the operation was a mutation or not. But I think I was wrong. If the result of __iadd__ is self, then either the operation was a mutation, or the assignment is "effectively" a no-op. (That is, the result of the op hasn't changed anything.)
I say "effectively" a no-op in scare quotes because setters will currently be called in this situation:
myobj.spam = "a" myobj.spam += "" # due to interning, "a" + "" may be the same object
Currently that will call spam's setter, and the argument will be the identical object as spam's current value. It may be that the setter is rather naive, and it doesn't bother to check whether the new value is actually different from the old value before performing its cache invalidation (or whatever). So you are right that this change will affect some code.
That doesn't mean we can't fix this. It just means we have to go through a transition period, like for any other change to Python's semantics. During the transition, you may need to import from __future__, or there may be a warning, or both. After the transition, writing:
myobj.spam = myobj.spam
will still call the spam setter, always. But augmented assignment may not, if __iadd__ returns the same object. (Not just an object with the same value, it has to be the actual same object.)
I think that's a reasonable change to make, to remove this nasty gotcha from the language.
For anyone relying on their cache being invalidated when it is touched, even if the touch otherwise makes no difference, they just have to deal with a slight change in the definition of "touched". Augmented assignment won't work. Instead of using cache.time_to_live += 0 to cause an invalidation, use cache.time_to_live = cache.time_to_live. Or better still, provide an explicit cache.invalidate() method.
I didn't think of some of these cases. But that just shows that there are _more_ cases that would be broken by the change, which means it's _more_ costly. The fact that these additional cases are easier to work around than the ones I gave doesn't change that.
Andrew Barnert wrote:
Consider any object with a cache that has to be invalidated when one of its members or attributes is set.
If obj were such an object, then obj.member.extend(foo) would fail to notify it of a change. Also, if obj1 and obj2 are sharing the same list, then obj1.member += foo would only notify obj1 of the change and not obj2. For these reasons, I wonder whether relying on a side effect of += to notify an object of changes is a good design. -- Greg
On Feb 14, 2015, at 22:11, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Andrew Barnert wrote:
Consider any object with a cache that has to be invalidated when one of its members or attributes is set.
If obj were such an object, then obj.member.extend(foo) would fail to notify it of a change.
Sure. Again, this is why every sorted container, caching object, etc. out there has to document "please don't mutate the keys/elements/members" and hope people read the docs. But you're always allowed to _assign_ to the elements. And, because of the semantics of Python, that includes augmented assignment. And people have used that fact in real code.
Also, if obj1 and obj2 are sharing the same list, then obj1.member += foo would only notify obj1 of the change and not obj2.
Sure, and this subtlety does--rarely, but occasionally--come up in real code, and then you have to think about what to do about it. Does that mean it would be better to face the problem every time, right from the start? In a new language, I might say yes.
For these reasons, I wonder whether relying on a side effect of += to notify an object of changes is a good design.
But that's not the only question when you're proposing a breaking change. Even if it's a bad design, is it worth going out of our way to break code that relies on that design?
But that's not the only question when you're proposing a breaking change. Even if it's a bad design, is it worth going out of our way to break code that relies on that design?
Do you know any code that relies on that design?
On Sat, Feb 14, 2015 at 5:30 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Feb 14, 2015 at 11:41:52AM -0800, Chris Barker wrote:
I have argued that this never would have come up if augmented assignment were only used for in-place operations,
And it would never happen if augmented assignment *never* was used for in-place operations.
Sure -- but then it would only be syntactic sugar, and mostly just for incrementing integers. I'm pretty sure when this was all added, there were two key use-cases: numpy wanted an operator for in=place operations, and lots of folks wanted an easy way to increment integers, etc. This was a way to kill two brids with one stone -- but resulted in good way for folks to get confused. Operator overloading fundamentally means that the definition of operators depends on the operands, but it's better if there isn't a totally different concept there as well. in-place operations and operating then-reassigning are conceptually very different. But that cat is out of the bag -- the question is: is there arw ay to limit the surprises -- can we either let "Mutating an object inside an immutable" work, or, at least, have it raise an exception before doing anything. Is suspect the answer to that is no, however :-( -Chris Alas, there's no way to enforce the rule that __iadd__ doesn't modify
objects in place,
sure -- though it could not do that for any built-in and be discouraged in the docs and PEP 8.
and it actually is a nice optimization when they can do so.
More than nice -- critical, and one of the original key use-cases. OH well. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Feb 14, 2015, at 11:12, Ethan Furman <ethan@stoneleaf.us> wrote:
On 02/13/2015 06:57 PM, Andrew Barnert wrote:
On Feb 13, 2015, at 18:46, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Andrew Barnert wrote:
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, ...
That's what I was thinking. But I'm not sure it would be a good design,
Now I'm confused.
The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad.
--> some_var = ([1], 'abc') --> tmp = some_var[0] --> tmp += [2, 3] --> some_var ([1, 2, 3], 'abc')
In that example, 'some_var' is modified without its __setitem__ ever being called.
Of course. Because one of some_var's elements is not being assigned to. There is no operation on some_var here at all; there's no difference between this code and code that uses .extend() on the element. Assigning to an item or attribute of some_var is an operation on some_var. It's true that languages with _different_ assignment semantics could probably give us a way to translate everything that relies on our semantics directly, and even let us write new code that we couldn't in Python (like some_var being notified in some way). C++-style references that can overload assignment, Tcl variable tracing, Cocoa KV notifications, whatever. But the idea that assignment to an element is an operation on the container/namespace is the semantics that Python had used for decades; anything you can reason through from that simple idea is always true; changing that would be bad.
Sorry, I didn't get the following two emails until I sent this one, so this reply is now largely irrelevant. Sent from a random iPhone On Feb 14, 2015, at 16:09, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 14, 2015, at 11:12, Ethan Furman <ethan@stoneleaf.us> wrote:
On 02/13/2015 06:57 PM, Andrew Barnert wrote:
On Feb 13, 2015, at 18:46, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Andrew Barnert wrote:
I think it's reasonable for a target to be able to assume that it will get a setattr or setitem when one of its subobjects is assigned to. You might need to throw out cached computed properties, ...
That's what I was thinking. But I'm not sure it would be a good design,
Now I'm confused.
The current design of Python guarantees that an object always gets a setattr or setitem when one of its elements is assigned to. That's an important property, for the reasons I suggested above. So any change would have to preserve that property. And skipping assignment when __iadd__ returns self would not preserve that property. So it's not just backward-incompatible, it's bad.
--> some_var = ([1], 'abc') --> tmp = some_var[0] --> tmp += [2, 3] --> some_var ([1, 2, 3], 'abc')
In that example, 'some_var' is modified without its __setitem__ ever being called.
Of course. Because one of some_var's elements is not being assigned to. There is no operation on some_var here at all; there's no difference between this code and code that uses .extend() on the element.
Assigning to an item or attribute of some_var is an operation on some_var.
It's true that languages with _different_ assignment semantics could probably give us a way to translate everything that relies on our semantics directly, and even let us write new code that we couldn't in Python (like some_var being notified in some way). C++-style references that can overload assignment, Tcl variable tracing, Cocoa KV notifications, whatever.
But the idea that assignment to an element is an operation on the container/namespace is the semantics that Python had used for decades; anything you can reason through from that simple idea is always true; changing that would be bad. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Feb 13, 2015 at 5:19 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Chris Barker - NOAA Federal wrote:
But why does it work that way? Because it needs to work on immutable objects. If it didn't, then you wouldn't need the "assign back to the original name" step.
This doesn't make it wrong for in-place operators to work on immutable objects. There are two distinct use cases:
1) You want to update a mutable object in-place.
what I think this the "real" use case. ;-) 2) The LHS is a complex expression that you only want
to write out and evaluate once.
Can you give an example of this? how can put an expression in the right hand side? I seem to only get errors: In [23]: l1 Out[23]: [1] In [24]: l2 Out[24]: [2] In [25]: (l1 + l2) += [3] File "<ipython-input-25-b8781c271c74>", line 1 (l1 + l2) += [3] SyntaxError: can't assign to operator which makes sense -- the LHS is an expression that results in a list, but += is trying to assign to that object. HOw can there be anything other than a single object on the LHS? In fact, if this worked like I expect and want it to -- that Would work. but that fat hat we've mixed in-place operation with "shorthand for operation plus assignment" makes this a mess. There are ways that the tuple problem could be fixed, such as skipping the assignment if __iadd__ returns the same
object. But that would be a backwards-incompatible change, since there could be code that relies on the assignment always happening.
I wonder if there is are many real use cases of that -- do people really write: In [28]: try: ....: t[0] += [4] ....: except TypeError: ....: pass It seems making that change would let things not raise an error that would otherwise. Or, I suppose, the equivalent of that try:except could be build into augmented assignment..
If it's in-place for a mutable object, it needs to return self. But
the python standard practice is that methods that mutate objects shouldn't return self ( like list.sort() ) for instance.
The reason for that is to catch the mistake of using a mutating method when you meant to use a non-mutating one. That doesn't apply to __iadd__, because you don't usually call it yourself.
Sure -- but my point is that at least by convention, se keep "mutating an object" and "creating a new object" clearly distinct -- but not in this case. I'm pretty convinced, band my memory of the conversation when this was added, is that the was a case of some people wanting shorthand like: i += 1 and others wanted a way to express in-place operations conveniently: array += 1 And this was seen as a way to kill two birds with one stone -- and that has let to this confusing behavior. And BOTH are simply syntactic sugar: i += 1 ==> i = i+1 and array += 1 ==> np.add(array, 1, out=array) I would argue that the seconds is a major win, and the first only a minor win. ( and yes, I did right a bunch of ugly code that looks like the second line before augmented assignment existed.) But this is all mute -- the cast is well out of the bag on this. Though if we could clean it up a bit, that would be nice. -Chris
-- Greg
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Sat, Feb 14, 2015 at 3:38 PM, Chris Barker <chris.barker@noaa.gov> wrote:
In [25]: (l1 + l2) += [3] File "<ipython-input-25-b8781c271c74>", line 1 (l1 + l2) += [3] SyntaxError: can't assign to operator
which makes sense -- the LHS is an expression that results in a list, but += is trying to assign to that object. HOw can there be anything other than a single object on the LHS?
You'd have to subscript it or equivalent.
options = {} def get_option_mapping(): mode = input("Pick an operational mode: ") if not mode: mode="default" if mode not in options: options[mode]=defaultdict(int) return options[mode] get_option_mapping()["width"] = 100 Pick an operational mode: foo options {'foo': defaultdict(<class 'int'>, {'width': 100})}
Sure, you could assign the dict to a local name, but there's no need - you can subscript the return value of a function, Python is not PHP. (Though, to be fair, PHP did fix that a few years ago.) And if it works with regular assignment, it ought to work with augmented... and it does:
get_option_mapping()["width"] += 10 Pick an operational mode: foo get_option_mapping()["width"] += 10 Pick an operational mode: bar options {'foo': defaultdict(<class 'int'>, {'width': 110}), 'bar': defaultdict(<class 'int'>, {'width': 10})}
The original function gets called exactly once, and then the augmented assignment is done using the resulting dict. ChrisA
On Fri, Feb 13, 2015 at 08:02:56AM -0800, Chris Barker - NOAA Federal wrote:
On Feb 12, 2015, at 10:24 PM, Steven D'Aprano <steve@pearwood.info> wrote:
+= duplicates the extend method on lists.
Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist.
Actually, that's the primary motivator for += and friends -- to support in-place operations on mutables. Notably numpy arrays.
I'm not sure that implementing __iadd__ for numpy arrays is much harder than implementing an iadd method, and there isn't that much difference in readability between these two possible alternatives: data += some_numpy_array data.iadd(some_numpy_array) If the motivation for augmented assignment was "numpy users don't want to type a method name", then I think that pandering to them was a poor decision. I don't remember the discussion leading up to augmented assignment being added to the language, but I remember that *before* it was added there was a steady stream of people asking why they couldn't write: n += 1 like in C. (And a smaller number wanting to write ++n and n++ too, but fortunately nowhere near as many.) [...]
But the "problem" here is that augmented assignment shouldn't work on immutables at all.
That isn't the problem. Having augmented assignment work with immutables works fine. Even tuples can work perfectly: py> t = ([1, 2, 3], "spam") py> t += (None,) py> t ([1, 2, 3], 'spam', None) I might have been tempted to say that the *real* problem is mutables, not immutables, but even that is not the case: py> t[0][2] += 1000 py> t ([1, 2, 1003], 'spam', None) Perhaps the actual problem is that objects have no way of signalling to Python that they have, or have not, performed an in-place mutation, so that Python knows whether or not to try re-binding to the left hand reference. Or perhaps the real problem is that Python is so dynamic that it cannot tell whether a binding operation will succeed. Or that after a binding fails, that it cannot tell whether or not to catch and suppress the exception. There's no shortage of candidates for "the real problem".
But then we wouldn't have the too appealing to resist syntactic sugar for integer incrementing.
But are you saying that augmented assignment was simply a mistake altogether, and therefore no new use of it should be added at all ( and you'd deprecate it if you could)?
What's done is done. I certainly wouldn't deprecate it now. If += was a mistake, then fixing that mistake is more painful than living with it. Is it a mistake? I don't know. I guess that depends on whether you get more value from being able to write += then you get from having to deal with corner cases that fail. The broader lesson from augmented assignment is that syntax and semantics are not entirely independent. Syntax that is unproblematic in C, with C's semantics, has painful corner cases with Python's semantics. The narrower lesson is that I am cautious about using operators when a method or function can do. Saving a few keystrokes is not sufficient. Bringing it back to dicts: settings.printer_config.update(global_settings, user_settings) is obvious and easy and will not fail if blob is a named tuple or printer_config is a read-only property (for example), while: settings.printer_config += global_settings + user_settings may succeed and yet still raise an exception. -- Steve
On Feb 13, 2015, at 18:19, Steven D'Aprano <steve@pearwood.info> wrote:
The narrower lesson is that I am cautious about using operators when a method or function can do. Saving a few keystrokes is not sufficient. Bringing it back to dicts:
settings.printer_config.update(global_settings, user_settings)
is obvious and easy and will not fail if blob is a named tuple or printer_config is a read-only property (for example), while:
settings.printer_config += global_settings + user_settings
may succeed and yet still raise an exception.
I think another way to look at this is that += is an assignment, while update isn't. In this case, you don't really want to assign anything to settings.printer_config, you want to modify what's there. So a method call like update makes more sense, even if it's a bit more verbose. If you always think things through like that, you'll never run into the "tuple wart". Of course your point about C is well-taken: assignment in C (and even more so in descendants like C++) is a very different thing from assignment in Python, and given how many Python developers come from a C-derived language, it's not surprising that people have inappropriate intuitions about when += is and isn't appropriate, and occasionally run into this problem. On the third hand, the problem doesn't come up very often in practice--that's why so many people don't run into it until they're already pretty experienced (and are therefore even more surprised), which implies that maybe it's not that important to worry about in the first place...
On Thu, Feb 12, 2015 at 07:43:36PM -0800, Chris Barker - NOAA Federal wrote:
avoids any confusion over operators and having += duplicating the update method. += duplicates the extend method on lists. Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist. It also leads to a quite ugly and unfortunate language wart with tuples:
py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None)
Try explaining to novices why this is not a bug. Er, I'm not a novice, but why isn't that a bug? I would have expected t to be altered as shown without raising an exception. (And given that the code does raise an exception, I find it surprising that it otherwise has
On 13/02/2015 06:19, Steven D'Aprano wrote: the presumably intended effect.) t[0] is a list, not a tuple, so I would expect it to behave as a list:
L = t[0] L += [2] t ([1,2], None)
I was also curious why you say that alist += blist is not quite the same as alist = alist + blist. (So far I've worked out that the first uses __iadd__ and the second uses __add__. And there is probably a performance difference. And the semantics could be different for objects of a custom class, but as far as I can see they should be the same for list objects. What have I missed?) I would appreciate it if you or someone else can find the time to answer. Rob Cliffe
On Feb 13, 2015, at 16:51, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
On 13/02/2015 06:19, Steven D'Aprano wrote:
On Thu, Feb 12, 2015 at 07:43:36PM -0800, Chris Barker - NOAA Federal wrote:
avoids any confusion over operators and having += duplicating the update method. += duplicates the extend method on lists. Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist. It also leads to a quite ugly and unfortunate language wart with tuples:
py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None)
Try explaining to novices why this is not a bug. Er, I'm not a novice, but why isn't that a bug? I would have expected t to be altered as shown without raising an exception. (And given that the code does raise an exception, I find it surprising that it otherwise has the presumably intended effect.) t[0] is a list, not a tuple, so I would expect it to behave as a list:
Right, t[0] behaves like a list. But assigning (augmented or otherwise) to t[0] isn't an operation on the value t[0], it's an operation on the value t (in particular, `t.__setitem__(0, newval)`). Augmented assignment does three things: it fetches the existing value, calls the __iadd__ operator on it (or __add__ if there's no __iadd__, which is how it works for immutable values), then stores the resulting new value. For this case, the first two steps succeed, and the third fails. There's really no good alternative here. Either you don't allow augmented assignment, you don't allow it to work with immutable values (so `t=[0]; t[0] += 1` fails), or you add reference types and overloadable assignment operators and the whole mess that comes with them (as seen in C++) instead of Python's nifty complex assignment targets.
L = t[0] L += [2] t ([1,2], None)
I was also curious why you say that alist += blist is not quite the same as alist = alist + blist. (So far I've worked out that the first uses __iadd__ and the second uses __add__. And there is probably a performance difference. And the semantics could be different for objects of a custom class, but as far as I can see they should be the same for list objects. What have I missed?)
Consider this case: >>> clist = [0] >>> alist = clist >>> blist = [1] Now, compare: >>> alist = alist + blist >>> clist [0] ... versus: >>> alist += blist >>> clist [0, 1] And that's because the semantics _are_ different for __add__ vs. __iadd__ for a list: the former creates and returns a new list, the latter modifies and returns self, and that's clearly visible (and often important to your code!) if there are any other references to self out there. That's the real reason we need __iadd__, not efficiency.
I would appreciate it if you or someone else can find the time to answer. Rob Cliffe
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Feb 13, 2015, at 18:02, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 13, 2015, at 16:51, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
On 13/02/2015 06:19, Steven D'Aprano wrote:
On Thu, Feb 12, 2015 at 07:43:36PM -0800, Chris Barker - NOAA Federal wrote:
avoids any confusion over operators and having += duplicating the update method. += duplicates the extend method on lists. Yes it does, and that sometimes causes confusion when people wonder why alist += blist is not *quite* the same as alist = alist + blist. It also leads to a quite ugly and unfortunate language wart with tuples:
py> t = ([], None) py> t[0] += [1] Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'tuple' object does not support item assignment py> t ([1], None)
Try explaining to novices why this is not a bug. Er, I'm not a novice, but why isn't that a bug? I would have expected t to be altered as shown without raising an exception. (And given that the code does raise an exception, I find it surprising that it otherwise has the presumably intended effect.) t[0] is a list, not a tuple, so I would expect it to behave as a list:
Right, t[0] behaves like a list. But assigning (augmented or otherwise) to t[0] isn't an operation on the value t[0], it's an operation on the value t (in particular, `t.__setitem__(0, newval)`).
Augmented assignment does three things: it fetches the existing value, calls the __iadd__ operator on it (or __add__ if there's no __iadd__, which is how it works for immutable values), then stores the resulting new value. For this case, the first two steps succeed, and the third fails.
There's really no good alternative here. Either you don't allow augmented assignment, you don't allow it to work with immutable values (so `t=[0]; t[0] += 1` fails), or you add reference types and overloadable assignment operators and the whole mess that comes with them (as seen in C++) instead of Python's nifty complex assignment targets.
L = t[0] L += [2] t ([1,2], None)
I was also curious why you say that alist += blist is not quite the same as alist = alist + blist. (So far I've worked out that the first uses __iadd__ and the second uses __add__. And there is probably a performance difference. And the semantics could be different for objects of a custom class, but as far as I can see they should be the same for list objects. What have I missed?)
Consider this case:
clist = [0] alist = clist blist = [1]
Now, compare:
alist = alist + blist clist [0]
... versus:
alist += blist clist [0, 1]
And that's because the semantics _are_ different for __add__ vs. __iadd__ for a list: the former creates and returns a new list, the latter modifies and returns self, and that's clearly visible (and often important to your code!) if there are any other references to self out there. That's the real reason we need __iadd__, not efficiency.
I would appreciate it if you or someone else can find the time to answer.
I found an old partial explanation I'd started a year or two ago that you might find useful (or, more likely, you'll find it confusing, but then maybe you can help me improve the explanation for future readers...) and posted it online (http://stupidpythonideas.blogspot.com/2015/02/augmented-assignments-b.html). In case you're wondering, it was originally intended as a StackOverflow answer to someone who wanted a more detailed explanation than the FAQ gives (and he brought up C++ for comparison), but then the question was closed before I could post it.
Chris Barker - NOAA Federal writes:
avoids any confusion over operators and having += duplicating the update method.
+= duplicates the extend method on lists.
And it's really redundant for numbers, too:
x += y
x = x + y
So plenty of precedent.
Except for the historical detail that Guido dislikes them! (Or did.) For a long time he resisted the extended assignment operators, insisting that x += 1 is best spelled x = x + 1 I forget whether he ever said "if you're worried about inefficiency of temp creation, figure out how to optimize the latter".
On 13 February 2015 at 02:24, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Feb 13, 2015 at 12:06:44AM +0000, Andrew Barnert wrote:
On Thursday, February 12, 2015 1:33 PM, Nathan Schneider <neatnate@gmail.com> wrote:
A reminder about PEP 448: Additional Unpacking Generalizations (https://www.python.org/dev/peps/pep-0448/), which claims that "it vastly simplifies types of 'addition' such as combining dictionaries, and does so in an unambiguous and well-defined way". The spelling for combining two dicts would be: {**d1, **d2}
Very nice! That should be extensible to multiple arguments without the pathologically slow performance of repeated addition:
{**d1, **d2, **d3, **d4}
and you can select which dict you want to win in the event of clashes by changing the order.
Better than that, you can even do {a: b, **d1, c: d, **d2, e: f, <...etc>} Best of all, it's implemented (albeit not yet accepted or reviewed): http://bugs.python.org/issue2292.
On 13 February 2015 at 05:46, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Eric Snow wrote:
However, the upside to the PEP 448 syntax is that merging could be done without requiring an additional intermediate dict.
It can also detect duplicate keywords and give you a clear error message. The operator would at best give a rather more generic error message and at worst silently discard one of the duplicates.
Since {1: 'a', 1: 'a'} is currently valid, it didn't make much sense to detect duplicates. Unpacking into keywords as f(**d1, **d2) does detect duplicates, however.
On Feb 12, 2015, at 2:14 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 12 February 2015 at 18:43, Steven D'Aprano <steve@pearwood.info> wrote:
I'm going to quote Raymond Hettinger, from a recent discussion here:
[quote] I wouldn't read too much in the point score on the StackOverflow question. First, the question is very old [...] Second, StackOverflow tends to award high scores to the simplest questions and answers [...] A high score indicates interest but not a need to change the language. A much better indicator would be the frequent appearance of ordered sets in real-world code or high download statistics from PyPI or ActiveState's ASPN cookbook. [end quote]
I'm surprised that this hasn't been mentioned yet in this thread, but why not create a function, put it on PyPI and collect usage stats from your users? If they are high enough, a case is made. If no-one downloads it because writing your own is easier than adding a dependency, that probably applies to adding it to the core as well (the "new dependency" then would be not supporting Python <3.5 (or whatever)). Converting a 3rd party function to a method on the dict class isn't particularly hard, so I'm not inclined to buy the idea that people won't use it *purely* because it's a function rather than a method, which is the only thing a 3rd party package can't do.
I don't think it's a good use for an operator (neither + nor | seem particularly obvious fits to me, the set union analogy notwithstanding).
Honestly I don’t think "put it on PyPI" is a very useful thing in cases like these. Consuming a dependency from PyPI has a cost, such the same as dropping support for some version of Python has a cost. For something with an easy enough work around the cost is unlikely to be low enough for people to be willing to pay it. Python often gets little improvements that on their own are not major enhancements but when looked at cumulatively they add up to a nicer to use language. An example would be set literals, set(["a", "b"]) wasn't confusing nor was it particularly hard to use, however being able to type {"a", "b"} is nice, slightly easier, and just makes the language jsut a little bit better. Similarly doing: new_dict = dict1.copy() new_dict.update(dict2) Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Honestly I don’t think "put it on PyPI" is a very useful thing in cases like these. Consuming a dependency from PyPI has a cost, such the same as dropping support for some version of Python has a cost. For something with an easy enough work around the cost is unlikely to be low enough for people to be willing to pay it.
Python often gets little improvements that on their own are not major enhancements but when looked at cumulatively they add up to a nicer to use language. An example would be set literals, set(["a", "b"]) wasn't confusing nor was it particularly hard to use, however being able to type {"a", "b"} is nice, slightly easier, and just makes the language jsut a little bit better.
Similarly doing:
new_dict = dict1.copy() new_dict.update(dict2)
Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
I'd expect the major constituency that would use an operator for this (especially over a function) is not well-represented on this list: students and newer users of python who just want to add two dictionaries together (they're probably also the ones upvoting that StackOverflow question). These people aren't going to download anything from PyPI; they probably don't know that PyPI exists, as I once didn't. I don't think anyone seriously supports this proposal as a time-saver or code-elegancer for developers like ourselves: people who are Python experts and who develop large projects in Python. As demonstrated several times in this thread, most of us probably already have our own libraries of utility functions, and adding another one- or three-liner to these is easy for us. Adding this feature makes a much bigger difference for someone who's learning to program using Python, or who already knows some programming but is picking up Python as an additional langauge. I see both positives and negatives to the proposal from this point of view. The positive would be ease of use. Python prides itself on being intuitive, and it really is much more intuitive in its basic everyday operation than other widespread languages like Java, in large part (IMO) because of its easy-to-use built-in default data structures. When I was learning Python, having already learned some C and Java, the idea of built-in list and dictionary data structures was an enormous relief. No more trying to remember the various methods of HashMap or ArrayList or worrying about which one I should be using where: these basic tools of programming just worked, and worked the way I'd expect, using very simple syntax. That was a big draw of the language, and I think the argument can be made here that adding a '+' operator for dictionaries adds to this quality of the language. { "a": 1, "b":2 } + { "c": 3} is the kind of thing that I'd try to see if it worked *before* checking the docs, and I'd be pleased if it did. That said, there's a negative here as well. The most obvious update procedure to me (others can argue about this if they wish) is overwriting keys from the lhs when a duplicate exists in the rhs. I think a big reason it seems obvious to me is that this is what I'd do if I had to implement such a function myself. But whatever resolution you have, it's going to catch people off-guard and introduce some low-visibility bugs into programs (especially the programs of the novices who I think are a big target audience of the idea). Of course, you could go with raising an error, but I don't think that's very productive, as it forces people to put a try/catch everywhere they'd use '+' which defeats the purpose of a simpler syntax in the first place. Overall I guess I'm +0 on the proposal. Most useful for novices who just want to add two dictionaries together already, but also potentially dangerous/frustrating for that same user-base. A version that raised an exception when there was a key collision would be pretty useful for novices, but completely useless for experience programmers, as the overhead of a try/catch greatly outweighs the syntactic benefit of an operator. -Peter Mawhorter
On 12 February 2015 at 19:27, Donald Stufft <donald@stufft.io> wrote:
Honestly I don’t think "put it on PyPI" is a very useful thing in cases like these. Consuming a dependency from PyPI has a cost, such the same as dropping support for some version of Python has a cost. For something with an easy enough work around the cost is unlikely to be low enough for people to be willing to pay it.
Python often gets little improvements that on their own are not major enhancements but when looked at cumulatively they add up to a nicer to use language. An example would be set literals, set(["a", "b"]) wasn't confusing nor was it particularly hard to use, however being able to type {"a", "b"} is nice, slightly easier, and just makes the language jsut a little bit better.
Similarly doing:
new_dict = dict1.copy() new_dict.update(dict2)
Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer. It adds another small reason why, taken with the other small reasons, someone might want to drop an older version of Python for a newer version.
Well put. My previous post was uncalled for. I'm still not sure I agree that the proposal is worth it, but this is a good argument in favour (and confirms my feeling that *if* it were to happen, spelling it as addition is the right way to do so). Paul
I had started a previous thread on this, asking why there was no addition defined for dictionaries, along the line of what Peter argued a novice would naively expect. ... some deja-vu on the discussion ... yes, the arguments were then as now 1) whether to use + or |, other operators that suggest asymmetric (non-commutative) operation were suggested as well - in my opinion dictionaries are not sets, so the use of | to indicate set-like behaviour brings no real advantage ... + seems the most natural for a novice and is clear enough 2) obviously, there were the same questions whether in case of key collisions the elements should be added - i.e., apply the + operator to the elements; it had been argued this was another reason to use | instead of +, however, why then not apply the | operator on elements in case of key collisions ...? - my opinion on this is now that you just *define* - and document - what the plus operator does for dictionaries, which should be most naturally defined on the basis of update (for +=) and rhs elements supersede lhs elements in a similar fashion for + as I would naively expect. Having this kind of consistent behaviour for dictionaries would be reasonably easy to understand and the += / + just becomes a shorthand form for update (or the suggested updated function). This also avoids confusion about different behaviour for different operations - the update behaviour us how dictionaries behave - and also limits the amount of new code that needs to be written. Operation would be associative. 3) This would be different than the behaviour of counters, but counters are not just plain dictionaries but have well-defined value types for their keys so operation on items is well defined whereas this is not the case for a dictionary in general. I am +1 on both + and += -Alexander
On Fri, Feb 13, 2015 at 08:30:09AM +1100, Alexander Heger wrote:
I had started a previous thread on this, asking why there was no addition defined for dictionaries, along the line of what Peter argued a novice would naively expect.
A programmer is a novice for about 1% of their lifespan as a programmer. Python should be novice-friendly. It shouldn't be designed around what novices expect. I wish that the people blithely declaring that novices expect this and novices expect that would actually spend some time on the tutor mailing list. I do spend time on the tutor mailing list, and what I have found is this: - novices don't care about dicts much at all - and updating them even less - novices rarely try things in the interactive interpreter to see what they do (one of the things which separates novice from junior is the confidence to just do it and see what happens) - on the rare occasion they search for information, they find it very hard to search for operators (that applies to everyone, I think) - apart from the mathematical use of operators, which they learn in school, they don't tend to think of using operators much. I don't think that "novices would naively expect this" is correct, and even if it were, I don't think the language should be designed around what novices naively expect.
... some deja-vu on the discussion ... yes, the arguments were then as now
1) whether to use + or |, other operators that suggest asymmetric (non-commutative) operation were suggested as well - in my opinion dictionaries are not sets, so the use of | to indicate set-like behaviour brings no real advantage ... + seems the most natural for a novice and is clear enough
I was a novice once, and I was surprised that Python used + for concatenation. Why would people expect + for something that isn't addition? I don't know where this idea comes from that it is natural and obvious, I've had to teach beginners that they can "add" strings or lists to concatenate them. As far as I was, and still am, concerned, & is the obvious and most natural operator for concatenation. [1, 2]+[3, 4] should return [4, 6], and sum(bunch of lists) should be a meaningless operation, like sum(bunch of HTTP servers). Or sum(bunch of dicts).
2) obviously, there were the same questions whether in case of key collisions the elements should be added - i.e., apply the + operator to the elements;
No, that's not the argument. The argument is not that "we used the + operator to merge two dicts, so merging should + the values". The argument is that in the event of a duplicate key, adding the values is sometimes a common and useful thing to do. Take Chris A's shopping list analogy. You need a loaf of bread, so you put it on your shopping list (actually a dict): {'bread': 1}. I need a loaf of bread too, so I do the same. Merging your dict and my dict better have two loaves of bread, otherwise one of us is going to miss out. -- Steve
I had started a previous thread on this, asking why there was no addition defined for dictionaries, along the line of what Peter argued a novice would naively expect. [...] I don't think that "novices would naively expect this" is correct, and even if it were, I don't think the language should be designed around what novices naively expect.
well, I had at least hoped this to work while not being a novice. Because it would be nice.
As far as I was, and still am, concerned, & is the obvious and most natural operator for concatenation. [1, 2]+[3, 4] should return [4, 6], and sum(bunch of lists) should be a meaningless operation, like sum(bunch of HTTP servers). Or sum(bunch of dicts).
This only works if the items *can* be added. Dics should not make such an assumption. & is not a more natural operator, because, why would you then not just expect that [1, 2] & [3, 4] returns [1 & 2, 3 & 4] == [0 , 0] ? the same would be true for any operator you pick. You just define that combination of dics, + or whatever operator is used, combines dicts using the update operation. This is how dicts behave.
No, that's not the argument. The argument is not that "we used the + operator to merge two dicts, so merging should + the values". The argument is that in the event of a duplicate key, adding the values is sometimes a common and useful thing to do.
... and sometimes it is not; in general it will fail. Replacing always works. -Alexander
Alexander Heger <python-ePO413wvQzY@public.gmane.org> writes:
As far as I was, and still am, concerned, & is the obvious and most natural operator for concatenation. [1, 2]+[3, 4] should return [4, 6], and sum(bunch of lists) should be a meaningless operation, like sum(bunch of HTTP servers). Or sum(bunch of dicts).
This only works if the items *can* be added. Dics should not make such an assumption. & is not a more natural operator, because, why would you then not just expect that [1, 2] & [3, 4] returns [1 & 2, 3 & 4] == [0 , 0] ? the same would be true for any operator you pick.
Which brings us back to the idea to introduce elementwise variants of any operator: [1,2] .+ [3,4] == [1+3, 2+4] [1,2] + [3,4] == [1,2,3,4] [1,2] .& [3,4] == [1 & 3, 2 & 4] [1,2] & [3,4] == not (yet?) defined As a regular numpy user, I'd be very happy about that too :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
On Fri, Feb 13, 2015 at 6:22 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
Alexander Heger <python-ePO413wvQzY@public.gmane.org> writes:
As far as I was, and still am, concerned, & is the obvious and most natural operator for concatenation. [1, 2]+[3, 4] should return [4, 6], and sum(bunch of lists) should be a meaningless operation, like sum(bunch of HTTP servers). Or sum(bunch of dicts).
This only works if the items *can* be added. Dics should not make such an assumption. & is not a more natural operator, because, why would you then not just expect that [1, 2] & [3, 4] returns [1 & 2, 3 & 4] == [0 , 0] ? the same would be true for any operator you pick.
Which brings us back to the idea to introduce elementwise variants of any operator:
[1,2] .+ [3,4] == [1+3, 2+4] [1,2] + [3,4] == [1,2,3,4] [1,2] .& [3,4] == [1 & 3, 2 & 4] [1,2] & [3,4] == not (yet?) defined
As a regular numpy user, I'd be very happy about that too :-).
... except for the part where in Numpy ".+" and "+" and so one would have to be identical, which would be no end of confusing especially when adding, say, a Numpy array and a list. But I agree in principle it would be nice to have element-wise operators in the language. I just fear it may be too late. Erik
As a regular numpy user, I'd be very happy about that too :-).
I wouldn't -- I hated that in Matlab. It turns out the only ace it's really useful is the two kinds of multiplication -- and we've now got the @ operator for that.
. I just fear it may be too late.
In some sense, it's too late _not_ to have them. It's been argued in this thread that overloading the math operators for things that aren't clearly math was a mistake. In which case they could be used for element-wise operations. But python wasn't designed to be an array oriented language -- so this is where we are. -Chris
On Feb 13 2015, Chris Barker - NOAA Federal <chris.barker-32lpuo7BZBA@public.gmane.org> wrote:
As a regular numpy user, I'd be very happy about that too :-).
I wouldn't -- I hated that in Matlab.
Well, I hate using Matlab, but I like the different operators :-).
It turns out the only ace it's really useful is the two kinds of multiplication -- and we've now got the @ operator for that.
I don't think so. It'd e.g. free up + for array concatenation, / for "vector" division (i.e., v / M == inverse(M) .* v, but without computing the inverse), and ** for matrix exponentials. Yeah, backwards compatibility, I now. Just fantasizing about what could have been.... Best, Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
On Feb 13 2015, Erik Bray <erik.m.bray-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On Fri, Feb 13, 2015 at 6:22 PM, Nikolaus Rath <Nikolaus-BTH8mxji4b0@public.gmane.org> wrote:
Alexander Heger <python-ePO413wvQzY-XMD5yJDbdMReXY1tMh2IBg@public.gmane.org> writes:
As far as I was, and still am, concerned, & is the obvious and most natural operator for concatenation. [1, 2]+[3, 4] should return [4, 6], and sum(bunch of lists) should be a meaningless operation, like sum(bunch of HTTP servers). Or sum(bunch of dicts).
This only works if the items *can* be added. Dics should not make such an assumption. & is not a more natural operator, because, why would you then not just expect that [1, 2] & [3, 4] returns [1 & 2, 3 & 4] == [0 , 0] ? the same would be true for any operator you pick.
Which brings us back to the idea to introduce elementwise variants of any operator:
[1,2] .+ [3,4] == [1+3, 2+4] [1,2] + [3,4] == [1,2,3,4] [1,2] .& [3,4] == [1 & 3, 2 & 4] [1,2] & [3,4] == not (yet?) defined
As a regular numpy user, I'd be very happy about that too :-).
... except for the part where in Numpy ".+" and "+" and so one would
Parse error.
have to be identical, which would be no end of confusing especially when adding, say, a Numpy array and a list.
Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
There's an obvious way out of all this. We add *two* new operators: d1 >> d2 # left operand wins d1 << d2 # right operand wins And if we really want to do it properly: d1 ^ d2 # raise exception on duplicate keys -- Greg
On Thu, Feb 12, 2015 at 4:56 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
There's an obvious way out of all this. We add *two* new operators:
d1 >> d2 # left operand wins d1 << d2 # right operand wins
And if we really want to do it properly:
d1 ^ d2 # raise exception on duplicate keys
+1 for having a standard solution to this problem. I have encountered it enough times to believe it is worth solving. If the solution is to be with operators, then +1 for << and >> as the least ambiguous. The only downside in my mind is that the need for combining dicts will be fairly rare, so there might not be enough justification to create new idioms for << and >>. Because it is a rare enough use case, though, a non-operator method might be the way to go, even though it is less algebraically pure. I like the idea of an operator or method that raises an exception on duplicate keys, but am not sure (a) whether such an exception should be raised if the two keys are mapped to equal values, or only if the two dicts are "in conflict" for some common key, and (b) whether, by analogy to set, ^ suggests that duplicate keys should be *omitted* from the result rather than triggering an exception. Nathan
On 2015-02-12 21:56, Greg Ewing wrote:
There's an obvious way out of all this. We add *two* new operators:
d1 >> d2 # left operand wins d1 << d2 # right operand wins
Next people will be suggesting for an operator that subtracts the LHS from the RHS!
And if we really want to do it properly:
d1 ^ d2 # raise exception on duplicate keys
Donald Stufft writes:
Python often gets little improvements that on their own are not major enhancements but when looked at cumulatively they add up to a nicer to use language. An example would be set literals, set(["a", "b"]) wasn't confusing nor was it particularly hard to use, however being able to type {"a", "b"} is nice, slightly easier, and just makes the language jsut a little bit better.
Yes, yes, and yes.
Similarly doing:
new_dict = dict1.copy() new_dict.update(dict2)
Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer.
Yes, no, and no. Succint? Yes. Just count characters or lines (but both are deprecated practices in juding Python style). Cleaner? No. The syntax is cleaner (way fewer explicit operations), but the semantics are muddier for me. At one time or another, at least four different interpretations of "dict addition" have been proposed already: 1. item of left operand wins Ie, "add new keys and their values". This is the one I think of as "add", and it's most analogous to the obvious algorithm for "add-as-in-union" for sets. 2. item of right operand wins Ie, "add new values, inserting new keys as needed". I think of this as "update", its per-item semantics is "replace", but it seems to be the favorite for getting "+" syntax. Eh? 3. keywise addition This is collections.Counter, although there are several plausible treatments of missing values (error, additive identity). 4. keys duplication is an error Strictly speaking nobody has proposed this, but it is implicit in many of the posts that give an example of adding dictionaries with no duplicate keys and say, "it's obvious what d1 + d2 should be". Then there's the variant where duplicate keys with the same value is OK. And in the "adding shopping dicts" example you could plausibly argue for two more: 5. keywise max of values So the first person to the refrigerator always has enough eggs to bake her cake, even if nobody else does. 6. keywise min of values If you are walking to the store. Well, maybe those two aren't very plausible.<wink /> Sure, I could internalize any given definition of "dict.__add__", but I probably won't soon. To make my own code readable (including "clearly correct semantics in the application domain") to myself, I'll use .add() and .update() methods, a merged() function, or where there's a domain concept that is more precise, that name. And finally, no. I don't think that's nicer. For me personally, it clearly violates TOOWTDI. For every new user it's definitely "one more thing to learn". And I don't see that the syntax itself is so much nicer than def some_name_that_expresses_domain_semantics(*dicts): new_dict = {} for dict in dicts: new_dict.update(dict) return new_dict new_dict = some_name_that_expresses_domain_semantics(d1, d2, d3) as to justify adding syntax where there is no obvious cross-domain abstraction.
On Fri, Feb 13, 2015 at 01:40:48PM +0900, Stephen J. Turnbull wrote:
Donald Stufft writes:
[...]
however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer.
Yes, no, and no.
Succint? Yes. Just count characters or lines (but both are deprecated practices in juding Python style).
Cleaner? No. The syntax is cleaner (way fewer explicit operations), but the semantics are muddier for me. At one time or another, at least four different interpretations of "dict addition" have been proposed already:
1. item of left operand wins Ie, "add new keys and their values". This is the one I think of as "add", and it's most analogous to the obvious algorithm for "add-as-in-union" for sets.
2. item of right operand wins Ie, "add new values, inserting new keys as needed". I think of this as "update", its per-item semantics is "replace", but it seems to be the favorite for getting "+" syntax. Eh?
3. keywise addition This is collections.Counter, although there are several plausible treatments of missing values (error, additive identity).
4. keys duplication is an error Strictly speaking nobody has proposed this, but it is implicit in many of the posts that give an example of adding dictionaries with no duplicate keys and say, "it's obvious what d1 + d2 should be". Then there's the variant where duplicate keys with the same value is OK.
Actually, there have been a few posts where people have suggested that duplicated keys should raise an exception. Including one that suggested it would be good for novices but bad for experienced coders.
And in the "adding shopping dicts" example you could plausibly argue for two more:
5. keywise max of values So the first person to the refrigerator always has enough eggs to bake her cake, even if nobody else does.
That's the multiset model.
6. keywise min of values If you are walking to the store.
One more, which was suggested on Stackoverflow: {'spam': 2, 'eggs': 1} + {'ham': 1, 'eggs': 3} => {'ham': 1, 'spam': 2, 'eggs': (1, 3)} -- Steve
Donald Stufft writes:
Python often gets little improvements that on their own are not major enhancements but when looked at cumulatively they add up to a nicer to use language. An example would be set literals, set(["a", "b"]) wasn't confusing nor was it particularly hard to use, however being able to type {"a", "b"} is nice, slightly easier, and just makes the language jsut a little bit better.
Yes, yes, and yes.
Similarly doing:
new_dict = dict1.copy() new_dict.update(dict2)
Isn't confusing or particularly hard to use, however being able to type that as new_dict = dict1 + dict2 is more succinct, cleaner, and just a little bit nicer.
Yes, no, and no.
Succint? Yes. Just count characters or lines (but both are deprecated practices in juding Python style).
Cleaner? No. The syntax is cleaner (way fewer explicit operations), but the semantics are muddier for me. At one time or another, at least four different interpretations of "dict addition" have been proposed already: There are many features of the languages where decisions have had to be made one way or another. Once the decision is made, it's unfair to say
On 13/02/2015 04:40, Stephen J. Turnbull wrote: the semantics are muddy because they could have been different. Rob Cliffe
Rob Cliffe writes:
There are many features of the languages where decisions have had to be made one way or another. Once the decision is made, it's unfair to say the semantics are muddy because they could have been different.
The decision hasn't been made yet. At this point, it *is* fair to argue that users will be confused about the semantics of the syntax when they encounter it, and that looking it up is a burden. Many times we see professional Python programmers on these lists say "I always have to look that one up". Whether that argument should win or not, is the question. It has been decided both ways in the past. Here the burden is not so great, but IMO the benefit is even less (to me net negative, as I would use, and often define, appropriate methods, not the syntax). YMMV, I just wanted to lay out the case for multiple interpretations being confusing, as I understand it. And missed two (Greg Ewing suggested non-numerical conflicts, and Steven reported a suggestion of multiple values inserted in container).
On Thu, Feb 12, 2015 at 3:43 AM, Steven D'Aprano <steve@pearwood.info> wrote:
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above.
From what I understand, the whole point of "d + d" (or "d | d") is as an alternative to the PEP 448 proposal of allowing multiple keyword arg unpacking clauses ("**") in function calls. So instead of "f(**a, **b)" it would be "f(**a+b)" (or "f(**a|b)"). However, the upside to the PEP 448 syntax is that merging could be done without requiring an additional intermediate dict. Personally, I'd rather have the syntax than the operator (particularly since it would apply to the dict constructor as well: "dict(**a, **b, **c)").
-eric
On Feb 12, 2015, at 6:43 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Thu, Feb 12, 2015 at 3:43 AM, Steven D'Aprano <steve@pearwood.info> wrote:
A very strong -1 on the proposal. We already have a perfectly good way to spell dict += , namely dict.update. As for dict + on its own, we have a way to spell that too: exactly as you write above.
From what I understand, the whole point of "d + d" (or "d | d") is as an alternative to the PEP 448 proposal of allowing multiple keyword arg unpacking clauses ("**") in function calls. So instead of "f(**a, **b)" it would be "f(**a+b)" (or "f(**a|b)"). However, the upside to the PEP 448 syntax is that merging could be done without requiring an additional intermediate dict. Personally, I'd rather have the syntax than the operator (particularly since it would apply to the dict constructor as well: "dict(**a, **b, **c)”)
That’s one potential use case but it can be used in a lot more situations than just that one. Hence why I said in that thread that being able to merge dictionaries is a much more general construct than only being able to merge dictionaries inside of a function call. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Eric Snow wrote:
However, the upside to the PEP 448 syntax is that merging could be done without requiring an additional intermediate dict.
It can also detect duplicate keywords and give you a clear error message. The operator would at best give a rather more generic error message and at worst silently discard one of the duplicates. -- Greg
On Thursday, February 12, 2015 1:08 AM, Ian Lee <ianlee1521@gmail.com> wrote:
Alright, I've tried to gather up all of the feedback and organize it in something approaching the alpha draft of a PEP might look like::
Proposed New Methods on dict ============================
Hold on. If you really only want these to work on dict, as opposed to dict subclasses and collections.abc.Mapping subclasses, you don't need the whole section on which type wins, and you can make everything a whole lot simpler. On the other hand, if you _do_ want them to work on all mappings, your definition doesn't work at all.
Adds two dicts together, returning a new object with the type of the left hand operand. This would be roughly equivalent to calling:
new_dict = old_dict.copy(); new_dict.update(other_dict)
The first problem is that (as you noticed), dict.copy returns a dict, not a type(self), and it copies directly from the underlying dict storage (if you subclass dict and override __fooitem__ but not copy). I think you can ignore this problem. Note that set.__or__ always returns a set, not a type(self), so set subclasses don't preserve their type with operators. If this is good enough for set.__or__, and for dict.copy, it's probably good enough for dict.__add__, right? More generally, whenever you subclass a builtin type and call its (non-overridden) builtin methods, it generally ignores your subclassiness (e.g., call __add__ on a subclass of int, and you get back an int, not an instance of your subclass). The bigger problem is that Mapping.copy doesn't exist, and of course Mapping.update doesn't exist, because that's a mutating method (and surely you wouldn't want to define a non-mutating method like + to only exist on MutableMapping?). This one is harder to dismiss. If you want to define this on Mapping, you can't just hand-wave it and say "it's like calling copy and update if those methods existed and worked this way", because you're going to have to write the actual implementation. Which means you might as well define it as (possibly a simplified version of) whatever you end up writing, and skip the handwaving. Fortunately, Set.__or__ already gives us a solution, but it's not quite as trivial as what you're suggesting: just construct a new dict from the key-value pairs of both dicts. Unfortunately, that construction isn't guaranteed behavior for a Mapping, just as construction from an iterable of values isn't guaranteed for a Set; Set handles this by providing a _from_iterable* classmethod (that defaults to just calling the constructor, but can be overridden if that isn't appropriate) to handle the rare cases where it needs to be violated. So, all you need is this: @classmethod def _from_iterable(cls, it): return cls(it) def __add__(self, rhs): return self._from_iterable(kv for m in (self, rhs) for kv in m.items()) This gives you an automatic and obvious answer to all the potentially bikesheddable questions: 1. The type is up to type(lhs) to decide, but will default to type(lhs) itself. 2. How duplicate keys are handled is up to type(lhs), but any dict-like type will keep the rightmost. And it will just work with any reasonable Mapping, or even most unreasonable ones.** OrderedDict will keep the order, Counter could remove its __add__ override if you gave it a _from_iterable,*** etc. This also means you probably don't need __radd__. However, Set.__or__ had the advantage that before it was added, Set, and collection ABCs, didn't exist at all, set itself was pretty new, set.__or__ already existed, etc., so there were few if any "legacy" set classes that Set.__or__ would have to work with. Mapping.__add__ doesn't have that advantage. So, let's go over the cases. But first, note that existing code using legacy mappng classes will be fine, because existing code will never use + on mappings. It's only if you want to write new code, that uses +, with legacy mapping classes that there's a potential problem. 1. Existing dict subclass: Will just work, although it will return a dict, not an instance of the subclass. (As mentioned at the top, I think this is fine.) 2. Existing collections.UserDict subclass: Will just work. 3. Existing collections.abc.Mapping subclass: If its constructor does the right thing with an iterable of key-value pairs (as most, but not all, such types do), it will work out of the box. If it doesn't accept such an iterable, you'll get a TypeError, which seems reasonable--you need to update your Mapping class, because it's not actually a valid Mapping for use in new code. The problem is that if it accepts such an iterable but does the wrong thing (see Counter), you may have a hard-to-debug problem on your hands. I think this is rare enough that you can just mention it in the PEP (and maybe a footnote or comment in the Added section in the docs) and dismiss it, but there's room for argument here. 4. None of the above, but follows the mapping protocol (including any code that still uses the old UserDict recipe instead of the stdlib version): Will raise a TypeError. Which is fine. The only potential problem is that someone might call collections.abc.Mapping.register(MyLegacyMapping), which would now be a lie, but I think that can be mentioned in the PEP and dismissed. Finally, what about performance? Unfortunately, my implementation is much slower than a copy/update solution. From a quick test of a class that delegates to a plain dict in the obvious way, it converges to about 3x slower at large N. You can test the same thing without the class: def a1(d1, d2): d = d1.copy() d.update(d2) return d def a2(d1, d2): return type(d1)(kv for m in (d1, d2) for kv in m.items()) d = {i:i for i in range(1000000)} %timeit a1(d, d) %timeit a2(d, d) You could provide an optimized MutableMapping.__add__ to handle this (using copy.copy to deal with the fact that copy isn't guaranteed to be there****), but I don't think it's really necessary. After all, you could get about the same benefit for MutableSet and MutableSequence operators, but the ABCs don't bother. But there is another advantage of a copy-and-update implementation: most legacy mappings are MutableMappings, and fewer of them are going to fail to be copyable than to be constructible from an iterable of key-value pairs, and those that fail are less likely to do so silently. So, even though in theory it's no better, in practice it may mean fewer problems. Maybe it would be simpler to add copy to the Mapping API, in which case MutableMapping.__add__ could just rely on that. But that (re-)raises the problem that dict.copy() returns a dict, not a type(self), so now it's violating the Mapping ABC. And I'm not sure you want to change that. The dict.copy method (and its C API equivalent PyDict_Copy) is used all over the place by the internals of Python (e.g., by type, to deal with attribute dictionaries for classes with the default behavior). And the fact that copy isn't documented to return type(self), and isn't defined on Mapping, implies that it's there really more for uses cases that need it to be fast, rather than because it's considered inherent to the type. So, I think you're better off just leaving out the copy/update idea. Meanwhile, one quick answer:
Adding mappings of different types ----------------------------------
Here is what currently happens in a few cases in Python 3.4, given:
class A(dict): pass class B(dict): pass foo = A({'a': 1, 'b': 'xyz'}) bar = B({'a': 5.0, 'c': object()})
Currently (this surprised me actually... I guess it gets the parent class?):
baz = foo.copy() type(baz) dict
As mentioned above, copy isn't part of the Mapping or MutableMapping API; it seems to be provided by dict because it's a useful optimization in many cases, that Python itself uses. If you look at the CPython implementation in Objects/dictobject.c, dict.copy just calls the C API function PyDict_Copy, which calls PyDict_New then PyDict_Merge. This is basically the same as calling d = dict() then d.update(self), but PyDict_Merge can be significantly faster because it can go right to the raw dict storage for self, and can fast-path the case where rhs is an actual dict as well. --- * Note that Set._from_iterable is only documented in a footnote in the collections.abc docs, and the same is probably fine for Mapping._from_iterable. ** The craziest type I can I think of is an AbsoluteOrderedDict used for caches that may have to be merged together and still preserve insertion order. I actually built one of these, by keeping a tree of TimestampedTuple(key, value) items, sorted by the tuple's timestamp. I'd have to dig up the code, but I'm pretty sure it would just work with this __add__ implementation. *** Of course Counter.__add__ might be faster than the generic implementation would be, so you might want to keep it anyway. But the fact that they have the same semantics seems like a point in favor of this definition of Mapping.__add__. **** Or maybe try self.copy() and then call copy.copy(self) on AttributeError. After all, self.copy() could be significantly faster (e.g., see UserDict.copy).
On Wed, Feb 11, 2015 at 9:59 PM, Chris Angelico <rosuav@gmail.com> wrote:
Does it have to be? It isn't commutative for strings or tuples either.
Good point. I never think of string "addition" as anything other than concatenation. I guess the same would be true of dictionaries. Still, if I add two strings, lists or tuples, the len() of the result is the same as the sum of the len()s. That wouldn't necessarily be the case for dictionaries, which, though well-defined, still can leave you open for a bit of a surprise when the keys overlap. Skip
On Fri, Feb 13, 2015 at 2:21 AM, Skip Montanaro <skip.montanaro@gmail.com> wrote:
On Wed, Feb 11, 2015 at 9:59 PM, Chris Angelico <rosuav@gmail.com> wrote:
Does it have to be? It isn't commutative for strings or tuples either.
Good point. I never think of string "addition" as anything other than concatenation. I guess the same would be true of dictionaries. Still, if I add two strings, lists or tuples, the len() of the result is the same as the sum of the len()s. That wouldn't necessarily be the case for dictionaries, which, though well-defined, still can leave you open for a bit of a surprise when the keys overlap.
Yes, but we already know that programming isn't the same as mathematics. That's even true when we're working with numbers - usually because of representational limitations - but definitely so when analogies like "addition" are extended to other data types. But there are real-world parallels here. Imagine if two people independently build shopping lists - "this is the stuff we need" - and then combine them. You'd describe that as "your list and my list", or "your list plus my list" (but not "your list or my list"; English and maths tend to get "and" and "or" backward to each other in a lot of ways), and the resulting list would basically be set union of the originals. If you have room on one of them, you could go through the other and add all the entries that aren't already present; otherwise, you grab a fresh sheet of paper, and start merging the lists. (If you're smart, you'll group similar items together as you merge. That's where the analogy breaks down, though.) It makes fine sense to take two shopping lists and combine them into a new one, discarding any duplicates. my_list = {"eggs": 1, "sugar": 1, "milk": 3, "bacon": 5} your_list = {"milk": 1, "eggs": 1, "sugar": 2, "coffee": 2} (let's assume we know what units everything's in) The "sum" of these two lists could possibly be defined as the greater of the two values, treating them as multisets; but if the assumption is that you keep track of most of what we need, and you're asking me if you've missed anything, then having your numbers trump mine makes fine sense. I have no problem with the addition of dictionaries resulting in the set-union of their keys, and the values following them in whatever way makes the most reasonable sense. Fortunately Python actually has separate list and dict types, so we don't get the mess of PHP array merging... ChrisA
On Fri, Feb 13, 2015 at 02:46:45AM +1100, Chris Angelico wrote:
Imagine if two people independently build shopping lists
Sorry, was that shopping *lists* or shopping *dicts*?
- "this is the stuff we need" - and then combine them. You'd describe that as "your list and my list", or "your list plus my list" (but not "your list or my list"; English and maths tend to get "and" and "or" backward to each other in a lot of ways), and the resulting list would basically be set union of the originals. If you have room on one of them, you could go through the other and add all the entries that aren't already present; otherwise, you grab a fresh sheet of paper, and start merging the lists. (If you're smart, you'll group similar items together as you merge. That's where the analogy breaks down, though.) It makes fine sense to take two shopping lists and combine them into a new one, discarding any duplicates.
Why would you discard duplicates? If you need 2 loaves of bread, and I need 1 loaf of bread, and the merged shopping list has anything less than 3 loaves of bread, one of us is going to miss out. -- Steve
On Fri, Feb 13, 2015 at 3:21 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Feb 13, 2015 at 02:46:45AM +1100, Chris Angelico wrote:
Imagine if two people independently build shopping lists
Sorry, was that shopping *lists* or shopping *dicts*?
I've never heard anyone in the real world talk about "shopping dicts" or "shopping arrays" or "shopping collections.Sequences". It's always "shopping lists". The fact that I might choose to represent one with a dict is beside the point. :)
- "this is the stuff we need" - and then combine them.
Why would you discard duplicates? If you need 2 loaves of bread, and I need 1 loaf of bread, and the merged shopping list has anything less than 3 loaves of bread, one of us is going to miss out.
I live in a house with a lot of people. When I was younger, Mum used to do most of the shopping, and she'd keep an eye on the fridge and usually know what needed to be replenished; but some things she didn't monitor, and relied on someone else to check them. If there's some overlap in who checks what, we're all going to notice the same need - we don't have separate requirements here. One person notes that we're down to our last few eggs and should buy another dozen; another person also notes that we're down to our last few eggs, but thinks we should probably get two dozen. Getting *three* dozen is definitely wrong here. And definitely her views on how much we should buy would trump my own, as my experience was fairly minimal. (These days, most of us are adult, and matters are a bit more complicated. So I'm recalling "the simpler days of my youth" for the example.) ChrisA
Chris Angelico writes:
I live in a house with a lot of people.
I live in a rabbit hutch that can barely hold 3 people, let alone a week's groceries. So do most of the natives here. So when you go to Costco and buy eggs by the gross and pork by the kilo, you take a bunch of friends, and add your shopping dicts keywise. Then you take all the loot to somebody's place, and split up the orders in Tupperware and collect the money, have some green tea, and then everybody goes home. The proposal to have dicts add keywise has actually been made on this list, and IIRC the last time is when collections.Counter was born. I'm a definite -1 on "+" or "|" for dicts. "+=" or "|=" I can live with as alternative spellings for "update", but they're both pretty bad, "+" because addition is way too overloaded (even in the shopping list context) and I'd probably think it means different things in different contexts, and "|" because the wrong operand wins in "short-circuit" evaluation.
On 12 February 2015 at 09:52, Stephen J. Turnbull <stephen@xemacs.org> wrote:
I'm a definite -1 on "+" or "|" for dicts. "+=" or "|=" I can live with as alternative spellings for "update", but they're both pretty bad, "+" because addition is way too overloaded (even in the shopping list context) and I'd probably think it means different things in different contexts, and "|" because the wrong operand wins in "short-circuit" evaluation.
Would this be less controversial if it were a new method rather than an odd use of an operator? I get the impression that most of the objections are especially against using an operator. While I think the operator is elegant, I would be almost as happy with a method, e.g. updated(). It also makes the semantics very clear when e.g. Counter objects are involved: new = a.updated(b) # equivalent to new = a.copy() new.update(b) It would also be nice to be able to pass multiple dictionaries like a.updated(b, c), to avoid repeated copying in a.updated(b).updated(c). Or perhaps even a classmethod: dict.merged(a, b, c) Thomas
On 02/12/2015 10:21 AM, Thomas Kluyver wrote:
Would this be less controversial if it were a new method rather than an odd use of an operator? I get the impression that most of the objections are especially against using an operator.
While I think the operator is elegant, I would be almost as happy with a method, e.g. updated(). It also makes the semantics very clear when e.g. Counter objects are involved:
new = a.updated(b) # equivalent to new = a.copy() new.update(b)
If we go with "updated" then it should be a separate function, like "sorted" and "reveresed" are. -- ~Ethan~
On 12 February 2015 at 10:27, Ethan Furman <ethan@stoneleaf.us> wrote:
If we go with "updated" then it should be a separate function, like "sorted" and "reveresed" are.
Arguments that it should be a method on mappings: 1. It's less generally applicable than sorted() or reversed(), which work for any finite iterable. 2. As a method on the object, it's very clear how it works with different types - e.g. Counter.updated() uses Counter.update(). This is less clear if it's based on the first argument to a function. 2a. You can do dict.update(counter_object, foo) to use the dict method on a Counter instance. 3. updated() as a standalone name is not very obvious - people think of how to 'combine' or 'merge' dicts, perhaps 'union' if they're of a mathematical bent. The symmetry between dict.update() and dict.updated() is clearer. 4. Set operations are defined with methods on set objects, not standalone functions. This seems roughly analogous. Thomas
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine: some_func(**dict.merged(a, b, c)) -eric * I'd go for the PEP 448 multiple kwargs-unpacking clauses, but as already noted, the keys are limited to valid identifiers. Hmm, perhaps that could be changed just for dict()...
On 02/12/2015 04:26 PM, Eric Snow wrote:
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine:
some_func(**dict.merged(a, b, c))
That looks an awful lot like some_func(**chainmap(a, b, c)) -- ~Ethan~
On 2015-02-13 00:30, Ethan Furman wrote:
On 02/12/2015 04:26 PM, Eric Snow wrote:
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine:
some_func(**dict.merged(a, b, c))
That looks an awful lot like
some_func(**chainmap(a, b, c))
How about making d1 | d2 return an iterator? You could then merge dicts with no intermediate dict: merged = dict(a | b | c) some_func(**a | b | c)
On 02/12/2015 04:30 PM, Ethan Furman wrote:
On 02/12/2015 04:26 PM, Eric Snow wrote:
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk> wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine:
some_func(**dict.merged(a, b, c))
That looks an awful lot like
some_func(**chainmap(a, b, c))
or maybe that should be some_func(**chainmap(c, b, a)) ? Whatever we choose, if we choose anything, should keep the rightmost wins behavior. -- ~Ethan~
On Feb 12, 2015 6:41 PM, "Ethan Furman" <ethan@stoneleaf.us> wrote:
On 02/12/2015 04:30 PM, Ethan Furman wrote:
On 02/12/2015 04:26 PM, Eric Snow wrote:
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk>
wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine:
some_func(**dict.merged(a, b, c))
That looks an awful lot like
some_func(**chainmap(a, b, c))
or maybe that should be
some_func(**chainmap(c, b, a))
?
Right. With chainmap leftmost wins. -eric
Whatever we choose, if we choose anything, should keep the rightmost wins
behavior.
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Feb 12, 2015 5:26 PM, "Eric Snow" <ericsnowcurrently@gmail.com> wrote:
On Thu, Feb 12, 2015 at 11:21 AM, Thomas Kluyver <thomas@kluyver.me.uk>
wrote:
Or perhaps even a classmethod:
dict.merged(a, b, c)
A dict factory classmethod like this is the best proposal I've seen thus far. * It would be nice if the spelling were more succinct (that's where syntax is helpful). Imagine:
some_func(**dict.merged(a, b, c))
Or just make "dict(a, b, c)" work. -eric
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion. Pros: - No more arguments about which operator is more intuitive, whether commutativity or associativity is more important, etc. It's just a short, simple function call. - Left-to-right semantics are obvious to anyone who reads left-to-right. - No new methods or built-in functions needed. - Want a subclass? Just call it instead of dict: MyDict(a, b, c). This should even work with OrderedDict, modulo the usual dicts-are-unordered issues. - Like dict.update, this can support iterables of (key,value) pairs, and optional keyword args. - dict.update should also be extended to support multiple mappings. - No pathologically slow performance from repeated addition. - Although the names are slightly different, we have symmetry between update-in-place using the update method, and copy-and-update using the dict constructor. - Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges. Cons: - dict(a,b,c,d) takes 6 characters more to write than a+b+c+d. - Doesn't support the less common merge semantics to deal with duplicate keys, such as adding the values. But then neither would a + operator. - The implementation for immutable mappings is not obvious. We can't literally define the semantics in terms of update, since Mapping.update doesn't exist: py> from collections import Mapping, MutableMapping py> Mapping.update Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: type object 'Mapping' has no attribute 'update' py> MutableMapping.update <function MutableMapping.update at 0xb7c1353c> -- Steven
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
Pros:
- No more arguments about which operator is more intuitive, whether commutativity or associativity is more important, etc. It's just a short, simple function call.
- Left-to-right semantics are obvious to anyone who reads left-to-right.
- No new methods or built-in functions needed.
- Want a subclass? Just call it instead of dict: MyDict(a, b, c). This should even work with OrderedDict, modulo the usual dicts-are-unordered issues.
Except that it doesn't work for OrderedDict. Or UserDict, or blist.sorteddict, or anything else, until those are changed.
- Like dict.update, this can support iterables of (key,value) pairs, and optional keyword args.
Which is already true of dict.__init__. It has the exact same signature--in fact, the example same implementation--as dict.update.
- dict.update should also be extended to support multiple mappings.
- No pathologically slow performance from repeated addition.
- Although the names are slightly different, we have symmetry between update-in-place using the update method, and copy-and-update using the dict constructor.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Cons:
- dict(a,b,c,d) takes 6 characters more to write than a+b+c+d.
- Doesn't support the less common merge semantics to deal with duplicate keys, such as adding the values. But then neither would a + operator.
- The implementation for immutable mappings is not obvious. We can't literally define the semantics in terms of update, since Mapping.update doesn't exist:
Of course, since update is a mutating method. More importantly, neither Mapping nor MutableMapping defines anything at all about its constructor behavior (and neither provides anything as mixin behavior--if you don't define __init__, you get object.__init__, which takes no params and does nothing. So this doesn't actually change anything at all except dict. All it can do for other mappings is to inspire them to adopt the new 3.5 dict construction behavior. And the issue with immutable Mapping types isn't that serious--or, rather, it's no more so with this change than it is today. The obvious way to implement MutableMapping.__init__ to be dict-like is to delegate to update (but again, remember that you already have to do so explicitly); there is no obvious way to implement Mapping.__init__, so you have to work something out that makes sense for your specific immutable type. The exact same thing is true after the proposed change.
py> from collections import Mapping, MutableMapping py> Mapping.update Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: type object 'Mapping' has no attribute 'update' py> MutableMapping.update <function MutableMapping.update at 0xb7c1353c>
On Thu, Feb 12, 2015 at 07:52:09PM -0800, Andrew Barnert wrote:
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Fair enough. It's a bigger change than I thought, but I still think it is worth doing. -- Steve
On Thu, Feb 12, 2015 at 11:21 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:52:09PM -0800, Andrew Barnert wrote:
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Fair enough. It's a bigger change than I thought, but I still think it is worth doing.
Is there any need to change MutableMapping.update, at least for now? Why not just focus on the signature of dict() and dict.update()? dict.update() would continue to be compatible with a single-positional-arg-signature MutableMapping.update so that shouldn't be too big a deal. -eric
On Fri, Feb 13, 2015 at 4:03 PM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Thu, Feb 12, 2015 at 11:21 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:52:09PM -0800, Andrew Barnert wrote:
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Fair enough. It's a bigger change than I thought, but I still think it is worth doing.
Is there any need to change MutableMapping.update, at least for now? Why not just focus on the signature of dict() and dict.update()? dict.update() would continue to be compatible with a single-positional-arg-signature MutableMapping.update so that shouldn't be too big a deal.
At this point this would just be doc change (actually, a doc addition): specifying semantics for multiple positional arguments, and saying that the new signature is preferred. I'm not proposing to add the warning now.
On Fri, Feb 13, 2015 at 4:52 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Is there a mechanism to evolve ABCs? Maybe there should be, something like: - For a few Python versions, say that the new behavior is preferred but can't be relied upon. Update the MutableMapping default implementation (this is backwards compatible). - Possibly, later, have MutableMapping warn when a class with the old update signature is registered. (With the signature improvements in py3, this should be possible for 99% cases.) Or just rely on linters to implement a check. - Finally switch over completely, so callers can rely on the new MutableMapping.update
On Feb 13, 2015, at 1:26, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 4:52 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Is there a mechanism to evolve ABCs? Maybe there should be, something like:
- For a few Python versions, say that the new behavior is preferred but can't be relied upon. Update the MutableMapping default implementation (this is backwards compatible). - Possibly, later, have MutableMapping warn when a class with the old update signature is registered. (With the signature improvements in py3, this should be possible for 99% cases.) Or just rely on linters to implement a check. - Finally switch over completely, so callers can rely on the new MutableMapping.update
Well, currently, ABCs don't test signatures at all, just the presence of a non-abstract method. So you'd have to change that first, which seems like a pretty big change. And the hard part is the design, not the coding (as, I suspect, with the abc module in the first place). What do you do to mark an @abstractmethod as "check signature against the new version, and, if not, warn if it's compatible with the previous version, raise otherwise"? For that matter, what is the exact rule for "compatible with the new version" given that the new version is essentially just (self, *args, **kwargs)? How do you specify that rule declaratively (or so we add a __check__(subclass_sig) method to abstract methods and make you do it imperatively)? And so on. It might be worth seeing what other languages/frameworks do to evolve signatures on their abc/interface/protocol types. I'm not sure there's a good answer to find (unless you want monstrosities like IXMLThingy4::ParseEx3 as in COM), but I wouldn't want to assume that without looking around first.
On Fri, Feb 13, 2015 at 11:02 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 13, 2015, at 1:26, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 4:52 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Is there a mechanism to evolve ABCs? Maybe there should be, something like:
- For a few Python versions, say that the new behavior is preferred but can't be relied upon. Update the MutableMapping default implementation (this is backwards compatible). - Possibly, later, have MutableMapping warn when a class with the old update signature is registered. (With the signature improvements in py3, this should be possible for 99% cases.) Or just rely on linters to implement a check. - Finally switch over completely, so callers can rely on the new MutableMapping.update
Well, currently, ABCs don't test signatures at all, just the presence of a non-abstract method. So you'd have to change that first, which seems like a pretty big change.
And the hard part is the design, not the coding (as, I suspect, with the abc module in the first place). What do you do to mark an @abstractmethod as "check signature against the new version, and, if not, warn if it's compatible with the previous version, raise otherwise"? For that matter, what is the exact rule for "compatible with the new version" given that the new version is essentially just (self, *args, **kwargs)? How do you specify that rule declaratively (or so we add a __check__(subclass_sig) method to abstract methods and make you do it imperatively)? And so on.
It might be worth seeing what other languages/frameworks do to evolve signatures on their abc/interface/protocol types. I'm not sure there's a good answer to find (unless you want monstrosities like IXMLThingy4::ParseEx3 as in COM), but I wouldn't want to assume that without looking around first.
Well, by "mechanism" I meant guidelines on how it should work, not a general implementation. I'd just special-case MutableMapping now, and if it later turns out to be useful elsewhere, figure a way to do it declaratively. I'm always wary of designing a general mechanism from one example. I propose the exact rule for "compatible with the new version" to be "the signature contains a VAR_POSITIONAL argument", with the check being skipped if the signature is unavailable. It's simple, only has false negatives if the signature is wrong, and false positives if varargs do something else (for this case there should be a warning in docs/release notes).
On Feb 13, 2015, at 3:08, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 11:02 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 13, 2015, at 1:26, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 4:52 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Is there a mechanism to evolve ABCs? Maybe there should be, something like:
- For a few Python versions, say that the new behavior is preferred but can't be relied upon. Update the MutableMapping default implementation (this is backwards compatible). - Possibly, later, have MutableMapping warn when a class with the old update signature is registered. (With the signature improvements in py3, this should be possible for 99% cases.) Or just rely on linters to implement a check. - Finally switch over completely, so callers can rely on the new MutableMapping.update
Well, currently, ABCs don't test signatures at all, just the presence of a non-abstract method. So you'd have to change that first, which seems like a pretty big change.
And the hard part is the design, not the coding (as, I suspect, with the abc module in the first place). What do you do to mark an @abstractmethod as "check signature against the new version, and, if not, warn if it's compatible with the previous version, raise otherwise"? For that matter, what is the exact rule for "compatible with the new version" given that the new version is essentially just (self, *args, **kwargs)? How do you specify that rule declaratively (or so we add a __check__(subclass_sig) method to abstract methods and make you do it imperatively)? And so on.
It might be worth seeing what other languages/frameworks do to evolve signatures on their abc/interface/protocol types. I'm not sure there's a good answer to find (unless you want monstrosities like IXMLThingy4::ParseEx3 as in COM), but I wouldn't want to assume that without looking around first.
Well, by "mechanism" I meant guidelines on how it should work, not a general implementation. I'd just special-case MutableMapping now, and if it later turns out to be useful elsewhere, figure a way to do it declaratively. I'm always wary of designing a general mechanism from one example.
I guess this means you need a MutableMappingMeta that inherits ABCMeta and does the extra check on update after calling the super checks on abstract methods? Should be pretty simple (and it's not a problem that update isn't abstract), if a bit ugly.
I propose the exact rule for "compatible with the new version" to be "the signature contains a VAR_POSITIONAL argument", with the check being skipped if the signature is unavailable. It's simple, only has false negatives if the signature is wrong, and false positives if varargs do something else (for this case there should be a warning in docs/release notes).
Yeah, if you're special-casing it and coding it imperatively, it's pretty simple. But shouldn't it also require a var kw param? After all, the signature is supposed to be (self, *iterables_or_mappings, **kwpairs).
On Fri, Feb 13, 2015 at 1:09 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 13, 2015, at 3:08, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 11:02 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
On Feb 13, 2015, at 1:26, Petr Viktorin <encukou@gmail.com> wrote:
On Fri, Feb 13, 2015 at 4:52 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Feb 12, 2015, at 18:45, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Feb 12, 2015 at 07:25:24PM -0700, Eric Snow wrote:
> Or just make "dict(a, b, c)" work.
I've come to the same conclusion.
- Surely this change is minor enough that it doesn't need a PEP? It just needs a patch and approval from a senior developer with commit privileges.
I'm not sure about this. If you want to change MutableMapping.update too, you'll be potentially breaking all kinds of existing classes that claim to be a MutableMapping and override update with a method with a now-incorrect signature.
Is there a mechanism to evolve ABCs? Maybe there should be, something like:
- For a few Python versions, say that the new behavior is preferred but can't be relied upon. Update the MutableMapping default implementation (this is backwards compatible). - Possibly, later, have MutableMapping warn when a class with the old update signature is registered. (With the signature improvements in py3, this should be possible for 99% cases.) Or just rely on linters to implement a check. - Finally switch over completely, so callers can rely on the new MutableMapping.update
Well, currently, ABCs don't test signatures at all, just the presence of a non-abstract method. So you'd have to change that first, which seems like a pretty big change.
And the hard part is the design, not the coding (as, I suspect, with the abc module in the first place). What do you do to mark an @abstractmethod as "check signature against the new version, and, if not, warn if it's compatible with the previous version, raise otherwise"? For that matter, what is the exact rule for "compatible with the new version" given that the new version is essentially just (self, *args, **kwargs)? How do you specify that rule declaratively (or so we add a __check__(subclass_sig) method to abstract methods and make you do it imperatively)? And so on.
It might be worth seeing what other languages/frameworks do to evolve signatures on their abc/interface/protocol types. I'm not sure there's a good answer to find (unless you want monstrosities like IXMLThingy4::ParseEx3 as in COM), but I wouldn't want to assume that without looking around first.
Well, by "mechanism" I meant guidelines on how it should work, not a general implementation. I'd just special-case MutableMapping now, and if it later turns out to be useful elsewhere, figure a way to do it declaratively. I'm always wary of designing a general mechanism from one example.
I guess this means you need a MutableMappingMeta that inherits ABCMeta and does the extra check on update after calling the super checks on abstract methods? Should be pretty simple (and it's not a problem that update isn't abstract), if a bit ugly.
Yes. The ugliness makes me wonder if the warning is worth it. I expect that popular libraries will change pretty fast after the new update becomes preferred, and other code will just wait until this hits them, not until they get a warning or until the old way is officially dropped from the docs. (FWIW, I don't see update() signature/mechanics documented on https://docs.python.org/3/library/collections.abc.html. Is this elsewhere?)
I propose the exact rule for "compatible with the new version" to be "the signature contains a VAR_POSITIONAL argument", with the check being skipped if the signature is unavailable. It's simple, only has false negatives if the signature is wrong, and false positives if varargs do something else (for this case there should be a warning in docs/release notes).
Yeah, if you're special-casing it and coding it imperatively, it's pretty simple.
But shouldn't it also require a var kw param? After all, the signature is supposed to be (self, *iterables_or_mappings, **kwpairs).
That part doesn't change, so I don't see a point in checking it. If we were doing full signature checking, then yes – but I think that best left to linter tools.
On Thu, Feb 12, 2015 at 6:45 PM, Steven D'Aprano <steve@pearwood.info> wrote:
- dict(a,b,c,d) takes 6 characters more to write than a+b+c+d.
and dict(a,b) is three times as many characters as: a + b - just sayin' I'm curious what the aversion is to having an operator -- sure, it's not a big deal, but then again there's very little cost, as well. I can't really see a "trap" here. Sure there are multiple ways it could be defined, but having it act like update() seems pretty explainable. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Feb 12, 2015 at 08:22:01PM -0800, Chris Barker wrote:
On Thu, Feb 12, 2015 at 6:45 PM, Steven D'Aprano <steve@pearwood.info> wrote:
- dict(a,b,c,d) takes 6 characters more to write than a+b+c+d.
and dict(a,b)
is three times as many characters as:
a + b
- just sayin'
And the difference between: default_settings + global_settings + user_settings versus: dict(default_settings, global_settings, user_settings) is only four characters. If you're using a, b, c as variable names in production, you probably have bigger problems than an extra four characters :-)
I'm curious what the aversion is to having an operator -- sure, it's not a big deal, but then again there's very little cost, as well. I can't really see a "trap" here. Sure there are multiple ways it could be defined, but having it act like update() seems pretty explainable.
Each of these in isolation is a little thing, but enough little things make a big thing: Duplicating functionality ("more than one way to do it") is not always wrong, but it's a little smelly (as in a code smell, or in this case, an API smell). Some people here find + to be a plausible operator. I don't. I react to using + for things which aren't addition in much the same way that I expect you would react if I suggested we used ** as the operator. "Why on earth would you want to use ** it's nothing like exponentiation!" As the experience of lists and tuples show, extending a syntax originally designed by C programmers to make numeric addition easier into non-numeric contests is fraught with odd corner cases, gotchas and language warts. Operator overloading may be a code smell as often as it is a benefit. Google for "operator overloading considered harmful". The + operator is a binary operator, which is a more limited interface than function/method calls. You can't write a.update(b, c, d, spam=23) using a single + call. Operators are harder to search for than named functions. Especially + which is an operator that has specific meaning to most search engines. Some languages may be able to optimize a + b + c + d to avoid making and throwing away the intermediate dicts, but in general Python cannot. So it is going to fall into the same trap as list addition does. While it is not common for this to lead to severe performance degradation, when it does happen the cost is *so severe* that we should think long and hard before adding any more O(N**2) booby traps into the language. -- Steve
On Feb 12, 2015, at 22:56, Steven D'Aprano <steve@pearwood.info> wrote:
Some people here find + to be a plausible operator. I don't. I react to using + for things which aren't addition in much the same way that I expect you would react if I suggested we used ** as the operator. "Why on earth would you want to use ** it's nothing like exponentiation!"
I'm not sure ** was the best example, because it also means both mapping unpacking and kwargs, both of which are _more_ relevant here than addition, not less. And ** for exponentiation is itself a little weird; you're just used to it at this point, if you don't find it weird anymore, that kind of argues against your point. Also, people have already suggested ridiculous things like << and ^ here, so you don't need to invent other ridiculous possibilities in the first place... But anyway, I take your point, and I agree that either constructor and update with multiple args, or expanded unpacking, are better solutions than adding + if the problems can be worked out.
On Thu, Feb 12, 2015 at 10:56 PM, Steven D'Aprano <steve@pearwood.info> wrote:
And the difference between:
default_settings + global_settings + user_settings
versus:
dict(default_settings, global_settings, user_settings)
is only four characters. If you're using a, b, c as variable names in production, you probably have bigger problems than an extra four characters :-)
and the difference between: a + b and a.sum(b) is only six characters. I _think_ your point is that the math symbols are for math, so not suggesting that we'd all be better off without any infix operators at all. But that ship has sailed -- Python overloads infix operators. And it does it for common usage for built-ins. So adding + and += for dicts fits right in with all this. Some people here find + to be a plausible operator. I don't. I react to
using + for things which aren't addition in much the same way that I expect you would react if I suggested we used ** as the operator. "Why on earth would you want to use ** it's nothing like exponentiation!"
merging two dicts certainly is _something_ like addition. I'd agree about re-using other arbitrary operators just because they exist. Some languages may be able to optimize a + b + c + d to avoid making and
throwing away the intermediate dicts, but in general Python cannot.
So it is going to fall into the same trap as list addition does. While it is not common for this to lead to severe performance degradation, when it does happen the cost is *so severe* that we should think long and hard before adding any more O(N**2) booby traps into the language.
well, it seems there are various proposals in this thread to create a shorthand for "merging two dicts into a new third dict" -- wouldn't those create the same booby traps? And teh booby traps aren't in code like: a + b + c + d -- folks don't generally put 100s of items in a line like that. The traps are when you do it in a loop: al_dicts = {} for d in a_bunch_of_dicts: start = start + d ( or sum() ) -- of course, in the above loop, if you had +=, you'd be fine. So it may be that adding + for mutables is not the same trap, as long as you add the +=, too. (can't put += in a comprehension, though) But this: "Some languages may be able to optimize a + b + c + d " Got me thinking -- this is actually a really core performance problem for numpy. In that case, the operators are really being used for math, so you can't argue that they shouldn't be supported for that reason -- and we really want readable code: y = a*x**2 + b*x + c really reads well, but it does create a lot of temporaries that kill performance for large arrays. You can optimize that by hand by doing somethign like: y = x**2 y *= a y += b*x y += c which really reads poorly! So I've thought for years that we should have a "numpython" interpreter that would parse out each expression, check its types, and if they were all numpy arrays, generate an optimized version that avoided temporaries, maybe even did nifty things like do the operations in cache-friendly blocks, multi thread them, whatever. (there is a package called numexp that does all these things already). But maybe cPython itself could do an optimization step like that -- examine an entire expression for types, and if they all support the right operations, re-structure it in a more optimized way. Granted, doing this is the general case would be pretty impossible but if the hooks are there, then individual type-bases optimizations could be done -- like the current interpreter has for adding strings. OK -- gotten pretty OT here.... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Feb 13, 2015 at 9:55 AM, Chris Barker <chris.barker@noaa.gov> wrote:
But this: "Some languages may be able to optimize a + b + c + d "
Got me thinking -- this is actually a really core performance problem for numpy. In that case, the operators are really being used for math, so you can't argue that they shouldn't be supported for that reason -- and we really want readable code:
y = a*x**2 + b*x + c
really reads well, but it does create a lot of temporaries that kill performance for large arrays. You can optimize that by hand by doing somethign like:
y = x**2 y *= a y += b*x y += c
which really reads poorly!
So I've thought for years that we should have a "numpython" interpreter that would parse out each expression, check its types, and if they were all numpy arrays, generate an optimized version that avoided temporaries, maybe even did nifty things like do the operations in cache-friendly blocks, multi thread them, whatever. (there is a package called numexp that does all these things already).
This should be pretty do-able with an import hook (or similarly a REPL hook) without the need to roll your own interpreter. There is already prior art. [1][2][3][4] I've been meaning for a while to write a library to make this easier and mitigate the gotchas. One of the trickiest is that it can skew tracebacks (a la macros or compiler optimizations in GDB).
But maybe cPython itself could do an optimization step like that -- examine an entire expression for types, and if they all support the right operations, re-structure it in a more optimized way.
It's tricky in a dynamic language like Python to do a lot of optimization at compile time, particularly in the case in the face of operator overloading. The problem is that there is no guarantee of some name's type (other than `object`) nor of the behavior of the types methods (including operators). So optimizations like the one you are suggesting make assumptions that cannot be checked until run-time (to ensure the optimization is applied safely). The stinky part is that the vast majority of the time, perhaps even always, your optimization could be applied at compile-time. But because Python is so flexible, the the possibility is always there that someone did something that breaks your assumptions. So run-time optimization is your only recourse. To some extent PyPy is state-of-the-art when in comes to optimizations, so it may be worth taking a look if the problem is tractable for Python in general. PEP 484 ("Type Hints") may help in a number of ways too. As for CPython, it has a peephole optimizer for compile-time, and there has been some effort to optimize at the AST level. However, as already noted you would need the optimization to happen at run-time. The only solution that comes to my mind is that the compiler (and/or optimizers) could leave a hint (emit an extra byte code, etc.) that subsequent code is likely to be able to have some optimization applied. Then the interpreter could do whatever check it needs to make sure and then apply the optimized code. Alternately the compiler could generate an explicit branch for the potential optimization (so that the interpreter wouldn't need to deal with it). I suppose there are a number of possibilities along that line, but I'm not an expert on the compiler and interpreter.
Granted, doing this is the general case would be pretty impossible but if the hooks are there, then individual type-bases optimizations could be done -- like the current interpreter has for adding strings.
Hmm. That is another approach too. You pay a cost for not baking it into the compiler/interpreter, but you also get more flexibility and portability. -eric [1] Peter Wang did a lightning talk at PyCon (in Santa Clara). [2] I believe Numba uses an import hook to do its thing. [3] Paul Tagliamonte's hy: https://github.com/hylang/hy [4] Li Haoyi's macropy: https://github.com/lihaoyi/macropy
On Fri, Feb 13, 2015 at 10:18 AM, Eric Snow <ericsnowcurrently@gmail.com> wrote:
But maybe cPython itself could do an optimization step like that -- examine an entire expression for types, and if they all support the right operations, re-structure it in a more optimized way.
It's tricky in a dynamic language like Python to do a lot of optimization at compile time, particularly in the case in the face of operator overloading.
Exactly -- I was thinking this would be a run-time thing. Which might make for a major change, as I expect the compiler compiles expressions ahead of time. To some extent PyPy is state-of-the-art when in comes to
optimizations, so it may be worth taking a look if the problem is tractable for Python in general.
That might be a way to go -- but it wold probably mean essentially reimplementing numpy in PyPy -- which PyPy may, in fact already be doing!
PEP 484 ("Type Hints") may help in a number of ways too.
yes, though at least for now -- it seem very focused on the use-case of pre-run-time type checking (pr compile time, too), so it'll be a while...
I suppose there are a number of possibilities along that line, but I'm not an expert on the compiler and interpreter.
way out of my depth here, too. Anyway, way OT for this thread -- _maybe_ I'll be able to compress these thoughts down to start a conversion her about it some day... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Chris Barker writes:
merging two dicts certainly is _something_ like addition.
But you can't say what that something is with precision, even if you abstract, and nobody contests that different applications of dicts have naively *different*, and in implementation incompatible, "somethings". So AFAICS you have to fall back on "my editor forces me to type all the characters, so let's choose the interpretation I use most often so cI can save some typing without suffering too much cognitive dissonance." Following up the off-topic comment:
[In numpy,] we really want readable code:
y = a*x**2 + b*x + c
really reads well, but it does create a lot of temporaries that kill performance for large arrays. You can optimize that by hand by doing something like:
y = x**2 y *= a y += b*x y += c
Compilers can optimize such things very well, too. I would think that a generic optimization to the compiled equivalent of try: y = x**2p y *= a y += b*x y += c except UnimplementedError: y = a*x**2 + b*x + c would be easy to do, possibly controlled by a pragma (I know Guido doesn't like those, but perhaps an extension PEP 484 "Type Hints" could help here).
On Feb 14, 2015, at 1:09 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Chris Barker writes:
merging two dicts certainly is _something_ like addition.
But you can't say what that something is with precision, even if you abstract, and nobody contests that different applications of dicts have naively *different*, and in implementation incompatible, "somethings".
So AFAICS you have to fall back on "my editor forces me to type all the characters, so let's choose the interpretation I use most often so cI can save some typing without suffering too much cognitive dissonance.”
Or we can choose the interpretation that has already been chosen by multiple locations within Python. That keys are replaced and rhs wins. This is consistent with basically every location in the stdlib and Python core where two dicts get combined in some fashion other than specialized subclasses. >>> a = {1: True, 2: True} >>> b = {2: False, 3: False} >>> dict(list(a.items()) + list(b.items())) {1: True, 2: False, 3: False} And >>> a = {1: True, 2: True} >>> b = {2: False, 3: False} >>> c = a.copy() >>> c.update(b) >>> c {1: True, 2: False, 3: False} And >>> {1: True, 2: True, 2: False, 3: False} >>> {1: True, 2: False, 3: False} And >>> a = {"a": True, "b": True} >>> b = {"b": False, "c": False} >>> dict(a, **b) {'b': False, 'c': False, 'a': True} --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Donald Stufft writes:
Or we can choose the interpretation that has already been chosen by multiple locations within Python. That keys are replaced and rhs wins. This is consistent with basically every location in the stdlib and Python core where two dicts get combined in some fashion other than specialized subclasses.
Sure, nobody contests that we *can*. To what benefit? My contention is that even the strongest advocates haven't come up with anything stronger than "saves typing characters". True, "dict = dict1 + dict2 + dict3" has a certain amount of visual appeal vs. the multiline version, but you're not going to go beyond that and write "dict = (dict1 + dict2)/2", or "dict = dict1 * dict2", certainly not with "rightmost item wins" semantics. So the real competition for typing ease and readability is dict = merged(dict1, dict2, dict3) and I see no advantage over the function version unless you think that everything should be written using pseudo-algebraic operators when possible. And the operator form is inefficient for large dictionaries (I admit I don't know of any applications where I'd care).
On Sat, Feb 14, 2015 at 01:17:15AM -0500, Donald Stufft wrote:
Or we can choose the interpretation that has already been chosen by multiple locations within Python. That keys are replaced and rhs wins. This is consistent with basically every location in the stdlib and Python core where two dicts get combined in some fashion other than specialized subclasses.
I'll vote +1 on that if you promise not to use the + operator :-) -- Steve
Some languages may be able to optimize a + b + c + d to avoid making and throwing away the intermediate dicts, but in general Python cannot. So it is going to fall into the same trap as list addition does. While it is not common for this to lead to severe performance degradation, when it does happen the cost is *so severe* that we should think long and hard before adding any more O(N**2) booby traps into the language.
you can still have both, make the + operator work for simple cases and warn people to use the function interface for more time-critical complicated cases. And sometimes you may want have certain behavior, e.g., a + ((b + c) + d), especially if you did overwrite __add__ for a custom behaviour (using default update on standard dicts this would be the same result, I think). -Alexander
Chris Barker writes:
I'm curious what the aversion is to having an operator
Speaking only for myself, I see mathematical notation as appropriate for abstractions, and prefer to strengthen TOOWDTI from "preferably" to "there's *only* one obvious way to do it" in evaluating proposals to introduce such notation. This is because mathematical syntax is deliberately empty of semantics. Context must determine the interpretation, or mathematical notation is obfuscation, not clarification.
On 02/13/2015 09:34 PM, Stephen J. Turnbull wrote:
Chris Barker writes:
I'm curious what the aversion is to having an operator
Speaking only for myself, I see mathematical notation as appropriate for abstractions, and prefer to strengthen TOOWDTI from "preferably" to "there's *only* one obvious way to do it" [...]
The problem with that is that what is obvious to one may not be obvious to another. Don't get me wrong, I don't want a hundred ways to do something, obvious or not, or even ten, but two or three should be within the grasp of most folks, and if each of those two or three groups that found their way "obvious" counted for 80%-90% of the group as a whole, I think we have a win. This-just-seems-obvious-to-me'ly yours, -- ~Ethan~
On Fri, Feb 13, 2015 at 03:27:15AM +1100, Chris Angelico wrote:
On Fri, Feb 13, 2015 at 3:21 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Feb 13, 2015 at 02:46:45AM +1100, Chris Angelico wrote:
Imagine if two people independently build shopping lists
Sorry, was that shopping *lists* or shopping *dicts*?
I've never heard anyone in the real world talk about "shopping dicts" or "shopping arrays" or "shopping collections.Sequences". It's always "shopping lists". The fact that I might choose to represent one with a dict is beside the point. :)
Sounds like you're copying Lua then :-)
- "this is the stuff we need" - and then combine them.
Why would you discard duplicates? If you need 2 loaves of bread, and I need 1 loaf of bread, and the merged shopping list has anything less than 3 loaves of bread, one of us is going to miss out.
I live in a house with a lot of people. When I was younger, Mum used to do most of the shopping, and she'd keep an eye on the fridge and usually know what needed to be replenished; but some things she didn't monitor, and relied on someone else to check them. If there's some overlap in who checks what, we're all going to notice the same need - we don't have separate requirements here.
But why is Angelico household shopping list model more appropriate for dict addition than the "last value applied wins" model used by dict.update? Or the multiset model, where the maximum value wins? Or the Counter model where values are added? One more for luck: the *first* value applied wins, rather than the last. You can add new keys that aren't already set, but you cannot change the value of existing keys. (You can add new items to Mum's shopping list, but if she says 2 loaves of bread, then 2 loaves it is.)
One person notes that we're down to our last few eggs and should buy another dozen; another person also notes that we're down to our last few eggs, but thinks we should probably get two dozen. Getting *three* dozen is definitely wrong here.
Fred is making a quiche and needs a dozen eggs. (Its a big quiche.) Wilma is baking some cakes and needs two dozen eggs. If you get less than three dozen, someone is going to miss out. The point I am making is that there are *multiple ways* to "add" two dicts, and no one model is best in all circumstances. Sometimes reading python-ideas is like deja vu all over again. Each time this issue gets raised, people seem to be convinced that there is an obvious and sensible meaning for dict merging. And they are right. If only we could all agree on *which* obvious and sensible meaning. Which is another way of saying that there is no one obviously correct and generally useful way to merge two dicts. Well, actually there is one, and we already have the dict.update method to do it. If + duplicates that, it's redundant, and if it doesn't, it's non-obvious and likely to be of more limited use. Providing any of these alternate merging behaviours in yur own code is very simple, a small helper function of just a few lines will do the trick. -- Steve
On Feb 12, 2015, at 1:42 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Which is another way of saying that there is no one obviously correct and generally useful way to merge two dicts. Well, actually there is one, and we already have the dict.update method to do it. If + duplicates that, it's redundant, and if it doesn't, it's non-obvious and likely to be of more limited use.
The fact that we already have dict.update indicates to me that the way dict.update works is the sensible way for + and += to work. I mean by your logic why do we have + and += for lists? People could just use copy() and extend() if they wanted to. Wanting to add two dictionaries is a fairly common desire, both with and without copying. If it weren't then it wouldn't pop up with some regularity on python-ideas. The real questionis what semantics do you give it, which I think is a fairly silly question because we already have the semantics defined via dict.update() and the dictionary literal and the dict() constructor itself. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
Addition and updating are completely different things. An update implies that the second value is more up-to-date or more correct, so its values should be preferred in the event of a clash. Addition has no such implication, so appealing for consistency between the two doesn't make any sense to me.
On 12/02/2015 18:42, Steven D'Aprano wrote:
Sometimes reading python-ideas is like deja vu all over again. Each time this issue gets raised, people seem to be convinced that there is an obvious and sensible meaning for dict merging. And they are right. If only we could all agree on *which* obvious and sensible meaning.
To me it's blatantly obvious and I simply don't understand why this is endlessly debated. You take the first item from the rhs, the second item from the lhs, the third item from the rhs... -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
On 12 February 2015 at 19:14, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
To me it's blatantly obvious and I simply don't understand why this is endlessly debated. You take the first item from the rhs, the second item from the lhs, the third item from the rhs...
Unless you're left handed when you start with the lhs, obviously. Paul
Paul Moore wrote:
On 12 February 2015 at 19:14, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
You take the first item from the rhs, the second item from the lhs, the third item from the rhs...
Unless you're left handed when you start with the lhs, obviously.
Unless you live in a country where you drive on the left side of the road, in which case it's the other way around. And of course all this is reversed if you're in the southern hemisphere. -- Greg
On 2015-02-12 16:27, Chris Angelico wrote:
On Fri, Feb 13, 2015 at 3:21 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Feb 13, 2015 at 02:46:45AM +1100, Chris Angelico wrote:
Imagine if two people independently build shopping lists
Sorry, was that shopping *lists* or shopping *dicts*?
I've never heard anyone in the real world talk about "shopping dicts" or "shopping arrays" or "shopping collections.Sequences". It's always "shopping lists". The fact that I might choose to represent one with a dict is beside the point. :)
- "this is the stuff we need" - and then combine them.
Why would you discard duplicates? If you need 2 loaves of bread, and I need 1 loaf of bread, and the merged shopping list has anything less than 3 loaves of bread, one of us is going to miss out.
I live in a house with a lot of people. When I was younger, Mum used to do most of the shopping, and she'd keep an eye on the fridge and usually know what needed to be replenished; but some things she didn't monitor, and relied on someone else to check them. If there's some overlap in who checks what, we're all going to notice the same need - we don't have separate requirements here. One person notes that we're down to our last few eggs and should buy another dozen; another person also notes that we're down to our last few eggs, but thinks we should probably get two dozen. Getting *three* dozen is definitely wrong here. And definitely her views on how much we should buy would trump my own, as my experience was fairly minimal.
What if X wants a pizza and Y wants a pizza? If you get only one, someone is going to be unhappy!
(These days, most of us are adult, and matters are a bit more complicated. So I'm recalling "the simpler days of my youth" for the example.)
Chris Angelico wrote:
we already know that programming isn't the same as mathematics.
True, but both mathematicians and programmers share a desire to be able to reason easily and accurately about the things they work with. But my main objection to an asymmetrical + for dicts is that it would be difficult to remember which way it went. Analogy with bytes + sequence would suggest that the left operand wins. Analogy with dict.update would suggest that the right operand winds. That's even true when we're working with numbers -
usually because of representational limitations - but definitely so when analogies like "addition" are extended to other data types. But there are real-world parallels here. Imagine if two people independently build shopping lists - "this is the stuff we need" - and then combine them. You'd describe that as "your list and my list", or "your list plus my list" (but not "your list or my list"; English and maths tend to get "and" and "or" backward to each other in a lot of ways), and the resulting list would basically be set union of the originals. If you have room on one of them, you could go through the other and add all the entries that aren't already present; otherwise, you grab a fresh sheet of paper, and start merging the lists. (If you're smart, you'll group similar items together as you merge. That's where the analogy breaks down, though.) It makes fine sense to take two shopping lists and combine them into a new one, discarding any duplicates.
my_list = {"eggs": 1, "sugar": 1, "milk": 3, "bacon": 5} your_list = {"milk": 1, "eggs": 1, "sugar": 2, "coffee": 2}
(let's assume we know what units everything's in)
The "sum" of these two lists could possibly be defined as the greater of the two values, treating them as multisets; but if the assumption is that you keep track of most of what we need, and you're asking me if you've missed anything, then having your numbers trump mine makes fine sense.
I have no problem with the addition of dictionaries resulting in the set-union of their keys, and the values following them in whatever way makes the most reasonable sense. Fortunately Python actually has separate list and dict types, so we don't get the mess of PHP array merging...
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Greg Ewing schrieb am 12.02.2015 um 22:24:
my main objection to an asymmetrical + for dicts is that it would be difficult to remember which way it went.
Analogy with bytes + sequence would suggest that the left operand wins. Analogy with dict.update would suggest that the right operand winds.
Do you really think that's an issue? Arithmetic expressions are always evaluated from left to right. It thus seems obvious to me what gets created first, and what gets added afterwards (and overwrites what's there). Stefan
Stefan Behnel wrote:
Arithmetic expressions are always evaluated from left to right. It thus seems obvious to me what gets created first, and what gets added afterwards (and overwrites what's there).
I'm okay with "added afterwards", but the "overwrites" part is *not* obvious to me. Suppose you're given two shopping lists with instructions to get everything that's on either list, but not double up on anything. You don't have a spare piece of paper to make a merged list, so you do it this way: Go through the first list and get everything on it, then go through the second list and get everything on that. Now one list contains "Bread (preferably white)" and the other contains "Bread (preferably wholemeal)". You haven't been told what to do in case of a conflict, it's left to your judgement. What do you when you encounter the second entry for bread? Do you put the one you have back and grab the other one, or do you just think "I've already got bread" and move on? Which is more obvious? -- Greg
On Sat, Feb 14, 2015 at 11:12:10AM +1300, Greg Ewing wrote:
Stefan Behnel wrote:
Arithmetic expressions are always evaluated from left to right. It thus seems obvious to me what gets created first, and what gets added afterwards (and overwrites what's there).
I'm okay with "added afterwards", but the "overwrites" part is *not* obvious to me. [snip yet another shopping list example]
I think that focusing on the question of "which wins" in the case of duplicate keys, the left or right operand, is not very productive. We can just declare that Python semantics are that the last value seen wins, which is the current behaviour for dict.update. Subclasses can override that, if they wish, but the easiest way is to just swap the order of the operands. Instead of a.update(b), use b.update(a). (If you're worried about contaminating other references to b, make a copy of it first. Whatever.) dict.update does what it does very well. It solves 90% of the "update in place" problems. Specialist subclasses, like a multiset or a Counter, can solve the other 90% *wink*. What these requests for "dict addition" are really asking about is an easy way to solve the "copy and update" problem. There are a few answers: (1) Not everything needs to be a one-liner or an expression. Just copy and update yourself, it's not hard. (2) Write your own helper function. It's a three line function. Not everything needs to be a built-in: def copy_and_update(a, b): new = a.copy() new.update(b) return new Add bells and whistles to taste. (3) Some people think it should be an operator. There are strong feelings about which operator: + | ^ << have all been suggested. Operators have some disadvantages: they can only take two arguments, so copy-and-updating a series of dicts means making repeated copies which are thrown away. There's the question of different types -- if you merge a SpamMapping and an EggsMapping and a CheeseMapping, what result do you get? If you care about the specific type, do you have to make yet another copy at the end to ensure you get the type you want? SpamMapping(a + b + c) # or a | b | c We shouldn't care about small efficiencies, say, 90% of the time, but this is starting to look a little worrisome, since it may lead to near quadratic behaviour. Using an operator means that you get augmented assignment for free, even if you don't define an __iadd__ or __ior__ method. But augmented assignment isn't entirely problem-free. If your mapping is embedded in an immutable data structure, then: structure.mapping |= other # or += if you prefer may *succeed and yet raise an exception*. That's a nasty language wart. To me, that's a good reason to look for an alternative. (4) So perhaps a method on Mapping is a better idea. That makes the return type obvious: it should be type(self). It allows implementations to be efficient in the face of multiple arguments, by avoiding the creation of temporary objects which are then copied and thrown away. Being a method, they can take keyword arguments too. But the obvious name, updated(), is uncomfortably close to update() and perhaps easily confused with it. The next most obvious name, copy_and_update(), is a little long for my tastes. (5) How about a function? It need not be a built-in, although there is precedence with sorted() and reversed(). It could be collections.updated(). With a function, it's a little less obvious what the return type should be. Perhaps the rule should be, if the first argument is a Mapping, use the type of that first argument. Otherwise use dict. (6) Or we can use the Mapping constructor. We're already part way there: py> dict({'a': 1, 'b': 2, 'c': 3}, a=9999) {'c': 3, 'b': 2, 'a': 9999} The constuctor makes a copy of the argument, and updates it with any keyword args. If it would take multiple arguments, that's the copy-and-update semantics we're after. The downside is that a few mappings -- defaultdict immediately comes to mind -- have a constuctor with a radically different signature. So you can't say: defaultdict(a, b, c, d) (Personally, I think that changing the constructor signature like that is a mistake. But I'm not sure what alternatives there are.) My sense of this is that using the constructor is the right solution, and for mappings with unusual signatures, consenting adults applies. For them, you have to do it the old-fashioned way: d = defaultdict(func) d.update(a, b, c, d) I don't care about solving this for every obscure mapping type in the universe. Even obscure mapping types in the standard library :-) -- Steve
On Feb 13, 2015, at 19:05, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, Feb 14, 2015 at 11:12:10AM +1300, Greg Ewing wrote:
Stefan Behnel wrote:
Arithmetic expressions are always evaluated from left to right. It thus seems obvious to me what gets created first, and what gets added afterwards (and overwrites what's there).
I'm okay with "added afterwards", but the "overwrites" part is *not* obvious to me. [snip yet another shopping list example]
I think that focusing on the question of "which wins" in the case of duplicate keys, the left or right operand, is not very productive. We can just declare that Python semantics are that the last value seen wins, which is the current behaviour for dict.update.
Subclasses can override that, if they wish, but the easiest way is to just swap the order of the operands. Instead of a.update(b), use b.update(a). (If you're worried about contaminating other references to b, make a copy of it first. Whatever.)
dict.update does what it does very well. It solves 90% of the "update in place" problems. Specialist subclasses, like a multiset or a Counter, can solve the other 90% *wink*. What these requests for "dict addition" are really asking about is an easy way to solve the "copy and update" problem. There are a few answers:
(1) Not everything needs to be a one-liner or an expression. Just copy and update yourself, it's not hard.
(2) Write your own helper function. It's a three line function. Not everything needs to be a built-in:
def copy_and_update(a, b): new = a.copy() new.update(b) return new
Add bells and whistles to taste.
(3) Some people think it should be an operator. There are strong feelings about which operator: + | ^ << have all been suggested. Operators have some disadvantages: they can only take two arguments, so copy-and-updating a series of dicts means making repeated copies which are thrown away.
There's the question of different types -- if you merge a SpamMapping and an EggsMapping and a CheeseMapping, what result do you get? If you care about the specific type, do you have to make yet another copy at the end to ensure you get the type you want?
SpamMapping(a + b + c) # or a | b | c
We shouldn't care about small efficiencies, say, 90% of the time, but this is starting to look a little worrisome, since it may lead to near quadratic behaviour.
Using an operator means that you get augmented assignment for free, even if you don't define an __iadd__ or __ior__ method. But augmented assignment isn't entirely problem-free. If your mapping is embedded in an immutable data structure, then:
structure.mapping |= other # or += if you prefer
may *succeed and yet raise an exception*. That's a nasty language wart. To me, that's a good reason to look for an alternative.
(4) So perhaps a method on Mapping is a better idea. That makes the return type obvious: it should be type(self). It allows implementations to be efficient in the face of multiple arguments, by avoiding the creation of temporary objects which are then copied and thrown away. Being a method, they can take keyword arguments too.
But the obvious name, updated(), is uncomfortably close to update() and perhaps easily confused with it. The next most obvious name, copy_and_update(), is a little long for my tastes.
(5) How about a function? It need not be a built-in, although there is precedence with sorted() and reversed(). It could be collections.updated().
With a function, it's a little less obvious what the return type should be. Perhaps the rule should be, if the first argument is a Mapping, use the type of that first argument. Otherwise use dict.
(6) Or we can use the Mapping constructor. We're already part way there:
py> dict({'a': 1, 'b': 2, 'c': 3}, a=9999) {'c': 3, 'b': 2, 'a': 9999}
The constuctor makes a copy of the argument, and updates it with any keyword args. If it would take multiple arguments, that's the copy-and-update semantics we're after.
The downside is that a few mappings -- defaultdict immediately comes to mind -- have a constuctor with a radically different signature. So you can't say:
defaultdict(a, b, c, d)
Actually, that isn't a problem. defaultdict doesn't have a radically different signature at all, it just has one special parameter, followed by whatever other stuff dict wants. The docs make it clear that whatever's there will be treated exactly the same as dict, and both the C implementation and the old Python-equivalent recipe both do this by effectively doing self.default_factory = args[0]; super().__init__(*args[1:], **kwargs). So there is no real problem at all with this suggestion. Which is why it's now my favorite of the bunch. Even if we get the generalized unpacking (which has other merits), I'd still like this one. Of course it only fixes dict and a very small handful of classes (including defaultdict, but not much else) that inherit or encapsulate a dict and delegate blindly. Every other class--OrderedDict and UserDict, all the third-party mappings, etc.--have to be changed as well or they don't get the change. (And the Mapping mixin can't help here, as the collections.abc classes don't provide any construction behavior at all.) But that's fine. Implementing (Mutable)Mapping doesn't guarantee that you are 100% like a dict, only that you're a (mutable) mapping. (For example, you already don't get a copy method.) If some mappings can be constructed from multiple mapping-or-iterator arguments and some can't, so what? That being said, I think UserDict _does_ need to be fixed as a special case, because its two purposes are (a) to exactly simulate dict as far as possible, and (b) to serve as sample code for people writing their own mappings. And since OrderedDict is pretty trivial to change, I'd probably do that too. But I wouldn't hunt through the stdlib looking for any special-purpose mappings, or add code that tries to warn on non-compliant third-party mappings, or even add anything to the documentation about the expected constructor signature (we don't currently explain how to take a single mapping or iterable-of-pairs plus keywords, or similarly for any other collections constructors, except the special case of Set._from_iterable, and only because that's needed to make set operators work). However, if you want update too (as you seem to, given your defaultdict suggestion), I _would_ change the MutableMapping.update default implementation (which automatically fixes the other collections mappings that aren't fixed by dict). That seems like a small gain for a minuscule cost, so why not?
(Personally, I think that changing the constructor signature like that is a mistake. But I'm not sure what alternatives there are.)
My sense of this is that using the constructor is the right solution, and for mappings with unusual signatures, consenting adults applies. For them, you have to do it the old-fashioned way:
d = defaultdict(func) d.update(a, b, c, d)
But d = defaultdict(func, a, b, c, d) will already just work. And if you want to preserve the first (or last, or whatever) one's default factory, that's trivial to do, and explicit without being horribly verbose: d = defaultdict(a.default_factory, a, b, c, d).
I don't care about solving this for every obscure mapping type in the universe. Even obscure mapping types in the standard library :-)
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Steven D'Aprano wrote:
I think that focusing on the question of "which wins" in the case of duplicate keys, the left or right operand, is not very productive. We can just declare that Python semantics are that the last value seen wins,
Sure we can, but that's something to be learned. It's not going to be obvious to everyone. -- Greg
On Sat, Feb 14, 2015 at 05:32:50PM +1300, Greg Ewing wrote:
Steven D'Aprano wrote:
I think that focusing on the question of "which wins" in the case of duplicate keys, the left or right operand, is not very productive. We can just declare that Python semantics are that the last value seen wins,
Sure we can, but that's something to be learned. It's not going to be obvious to everyone.
*shrug* You mean people have to learn what programming languages do before they can program? Say it isn't so! *wink* dict.update() goes back to Python 1.5 and perhaps older. That horse has long since bolted. Let's just proclaim the argument about *semantics* settled, so we can concentrate on the really important part: arguing about the colour of the bike-shed. I mean, syntax/interface. There are various semantics for merging/updating a mapping with another mapping. We've covered them repeatedly, I'm not going to repeat them here. But the most commonly used one, the most general one, is what dict.update does, and even though it is something that needs to be learned, it's pretty easy to learn because it matches the left-to-right behaviour of other parts of Python. So let's just agree that we're looking for a not-in-place equivalent to dict.update. Anyone who wants one of the less general versions can do what Counter does, and subclass. Do we have consensus that copy-and-update should use the same semantics as dict.update? -- Steven
dict.update() goes back to Python 1.5 and perhaps older.
Indeed - and not only that, but update() has proved useful, and there are no methods for the other options proposed. Nor have they been proposed to be added, AFAIK. There is Counter, but as Steven points out -- it's a subclass, partly because that behavior isn't usable in the general case of dicts with arbitrary keys. However, this thread started with a desire for the + and += operators -- essentially syntactic sugar ( which I happen to like ). So is there a real desire / use case for easy to spell merge-into-new-dict behavior at all? -Chris
That horse has long since bolted. Let's just proclaim the argument about *semantics* settled, so we can concentrate on the really important part: arguing about the colour of the bike-shed. I mean, syntax/interface.
There are various semantics for merging/updating a mapping with another mapping. We've covered them repeatedly, I'm not going to repeat them here. But the most commonly used one, the most general one, is what dict.update does, and even though it is something that needs to be learned, it's pretty easy to learn because it matches the left-to-right behaviour of other parts of Python.
So let's just agree that we're looking for a not-in-place equivalent to dict.update. Anyone who wants one of the less general versions can do what Counter does, and subclass.
Do we have consensus that copy-and-update should use the same semantics as dict.update?
-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Feb 14, 2015 at 7:59 AM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
However, this thread started with a desire for the + and += operators -- essentially syntactic sugar ( which I happen to like ). So is there a real desire / use case for easy to spell merge-into-new-dict behavior at all?
-Chris
I'm pretty sure it's already been mentioned but as many people seem to be asking for a use-case again: the primary use-case that I can see for either an operator or a merge function is doing the operation (copy-and-merge) as part of a larger expression, like so: result = foo(bar(merged(d1, d2)) or result = foo(bar(d1 + d2)) There are plenty of times when succinct code is a virtue, and the lack of a function/operator for merging dictionaries requires an extra line of code, despite the concept being relatively straighforward. A more complicated exmaple provides even better motivation IMO: result = [ merged(d1, d2) for d1 in list1 for d2 in list2 ] In fact, the entire concept of generator expressions is motivated by the same logic: with sufficiently clear syntax, expressing complex operations that conform to common usage patterns and are thus easy to understand as a single line of code (or single expression) is beneficial relative to having to type out a for-loop every time. Generator expressions are a really great part of Python, because they make your code more readable (it's not just having to read fewer lines of code, but I think more to do with having to mentally track fewer variables IMO). A "merged" function for dictionaries would achieve a similar result, by in the examples above eliminating a variable assignment and a line of code. As for the color of the bike-shed... Arguments here have convinced me that an operator is perhaps unnecessary, so a function (either class method, builtin, or normal method) seems like the best solution. Given that I prefer "merged" for a name, as it's more intuitive for me than "updated", although it's sad to miss out on the parallelism with sort/sorted. I do think something past-tense is good, because that's a mnemonic that greatly helped me learn when/where to use sort vs. sorted. -Peter Mawhorter P.S. A concrete example use case taken from code that I've actually written before: def foo1(**kwargs): defaults = { "a": 1, "b": 2 } defaults.update(kwargs) process(defaults) def foo2(**kwargs): defaults = { "a": 1, "b": 2 } process(merged(defaults, kwargs)) foo1 uses the existing syntax, while foo2 uses my favorite version of the new syntax. I've always felt uneasy about foo1, because if you read the code too quickly, it *looks* as though two very odd things are happening: first, you naively see the defaults being changed and wonder if there could be bug where the defaults are retaining updates (if you later change defaults to a global variable, watch out!), and second, you see process(defaults) and wonder if there's a bug where the external input is being ignored. The alternative that clears up these code readability issues involves an extra line of code and a whole extra variable though, so it isn't really a good option. The new syntax here does force me to pay a small performance penalty (I'm creating the extra variable internally) but that's almost never going to be an issue given the likely size of the dictionaries involved. At the same time, the new syntax is miles clearer and it's also more concise, plus it protects you in the case that you do move "defaults" out of the function.
On Sat, Feb 14, 2015 at 07:59:58AM -0800, Chris Barker - NOAA Federal wrote:
dict.update() goes back to Python 1.5 and perhaps older.
Indeed - and not only that, but update() has proved useful, and there are no methods for the other options proposed. Nor have they been proposed to be added, AFAIK.
There is Counter, but as Steven points out -- it's a subclass, partly because that behavior isn't usable in the general case of dicts with arbitrary keys.
However, this thread started with a desire for the + and += operators -- essentially syntactic sugar ( which I happen to like ). So is there a real desire / use case for easy to spell merge-into-new-dict behavior at all?
I think there is, for the same reason that there is a use-case for sorted() and reversed(). (Only weaker, otherwise we'd already have it.) Sometimes you want to update in place, and sometimes you want to make a copy and update. So there's definitely a use-case. (Aside: In Ruby, there's a much stronger tradition of having two methods for operations, an in-place one and one which returns a new result.) But is it a strong use-case? I personally don't think so. I suspect that it's more useful in theory than in practice. But other disagree. I think that "not every three line function needs to be built-in" applies to this, but we have sorted() and that's turned out to be useful, so perhaps so long as there is *some* uses for this, and there's no *harm* in providing it (and somebody provides the patch) the response is "sure, it doesn't *need* to be a built-in, but there's not real downside to making it such". (Damning it with faint praise, I know. But maybe somebody will demonstrate that there actually are strong uses for this.) I've already explained that of all the options, extending the dict constructor (and update method) to allow multiple arguments seems like the best interface to me. As a second-best, a method or function would be okay too. I don't think this is a good match for an operator, and I especially dislike the suggestion to use plus. If it had to be an operator, I dislike | less strongly than +. Why |? Because dict keys are sets. (Before Python had sets, we used dicts with all the values set to None as a set-like data structure.) Updating a dict is conceptually closer to set intersection than concatenation: py> a = {1:None, 2:None} py> b = {3:None, 2:None} py> a.update(b) py> a {1: None, 2: None, 3: None} py> {1, 2} | {3, 2} {1, 2, 3} But: py> [1, 2] + [3, 2] [1, 2, 3, 2] I'm still not *really* happy about using an operator for this -- it doesn't feel like "operator territory" to me -- but if it had to be an operator, | is the right one. -- Steve
However, this thread started with a desire for the + and += operators -- essentially syntactic sugar ( which I happen to like ). So is there a real desire / use case for easy to spell merge-into-new-dict behavior at all?
I think there is, for the same reason that there is a use-case for sorted() and reversed(). (Only weaker, otherwise we'd already have it.) Sometimes you want to update in place, and sometimes you want to make a copy and update. So there's definitely a use-case.
I think the issue is not that there is a weaker use case - it's as good, if not stronger - but the discussion has been more emotional as it started off with "adding" dictionaries using some operator(s), += and +; and whereas I still like the operator syntax as shorthand, having a "merged" function and maybe extension of the "update" method for multiple dicts, or an "updated" function would definitively a good start - and the question on operator or not as shorthand for these can be fought later and separately. For me, having a "merged" function would definitely make a lot of code a lot more legible, as Peter suggested. I merge/update dictionaries a lot more often than I sort things, it probably is somewhere between 1 in 10 to 1 in 100.
I'm still not *really* happy about using an operator for this -- it doesn't feel like "operator territory" to me -- but if it had to be an operator, | is the right one.
There is no "right" one. See previous post. You say it is "more like" - but it's not the same, either way, for "+" or for "|" on sets. Maybe we do want people to understand it is not the same as set combination? -Alexander
I'm +1 on constructor, +0.5 on a function (whether it's called updated or merged, whether it's in builtins or collections), +0.5 on both constructor and function, -0.5 on a method, and -1 on an operator. Unless someone is seriously championing PEP 448 for 3.5, in which case I'm -0.5 on anything, because it looks like PEP 448 would already give us one obvious way to do it, and none of the alternatives are sufficiently nicer than that way to be worth having another. I agree with everything in Steven's last post about syntax, so I won't repeat it. Additional considerations he didn't mention: A method or operator raises the same return type issue that set methods return, as soon as you want to extend this beyond dict to other mappings. You're going to need to copy the _from_iterable classmethod documented only in the constructor idea, which is ugly--and it won't work as well in Mapping as in Set, because there are a ton of non-compliant mappings already in the wild, and it's not even clear how some of them _should_ work. A constructor not only makes the type even more obvious and explicit, it also answers the implementation problem: the only code that has to know about the constructor's signature is the constructor itself, which isn't a problem. Even for weird cases like defaultdict. A function also avoids the problem, if not quite as nicely. If sorted and reversed return whatever type they want, not the type you pass in, so why should updated be any different? If you really need it to be a MyMapping, you can do MyMapping(updated(a, b)), with the same verbosity and performance cost as when you have to do MySequence(sorted(t)). A function has the added advantage that it can be backported trivially for 3.4 and 2.7 code. In fact, because the constructor is more general and powerful, but the function is more backportable, I don't see much problem adding both. On Feb 14, 2015, at 9:46, Steven D'Aprano <steve@pearwood.info> wrote:
I've already explained that of all the options, extending the dict constructor (and update method) to allow multiple arguments seems like the best interface to me. As a second-best, a method or function would be okay too.
On Feb 10, 2015, at 23:21, Ian Lee <ianlee1521@gmail.com> wrote:
I mentioned this on the python-dev list [1] originally as a +1 to someone else suggesting the idea [2]. It also came up in a response to my post that I can't seem to find in the archives, so I've quoted it below [3].
As the subject says, the idea would be to add a "+" and "+=" operator to dict
There have been other suggestions in the past for a non-mutating equivalent to dict.update, whether it's spelled + or | or a method. I personally think the idea makes sense, but it's probably worth doing a bit of digging to see why those past suggestions failed. (If you're lucky, everyone likes the idea but nobody got around to implementing it, so all you have to do is write a patch and link to the previous discussion. If you're unlucky, someone made a good case that it's an attractive nuisance or a confusing API of something that you won't have an answer for--but at least then you'll know what you need to answer.)
that would provide the following behavior:
{'x': 1, 'y': 2} + {'z': 3} {'x': 1, 'y': 2, 'z': 3}
With the only potentially non obvious case I can see then is when there are duplicate keys, in which case the syntax could just be defined that last setter wins,
There's also the issue of what type you get when you add two Mappings of different types. And whether there should be an __radd__ that makes sure a legacy Mapping type plus a dict works. And whether the operator should be added to the Mapping ABC or just to the concrete class (and with a default implementation?). And whether it's worth writing a dict subclass that adds this method and putting it on PyPI as a backport (people writing 3.3+ code or 2.7/3.5 code can then just "from dict35 import dict35 as dict", but of course they still won't be able to add two dicts constructed from literals).
e.g.:
{'x': 1, 'y': 2} + {'x': 3} {'x': 3, 'y': 2}
Which is analogous to the example:
new_dict = dict1.copy() new_dict.update(dict2)
With "+=" then essentially ending up being an alias for ``dict.update(...)``.
I'd be happy to champion this as a PEP if the feedback / public opinion heads in that direction.
[1] https://mail.python.org/pipermail/python-dev/2015-February/138150.html [2] https://mail.python.org/pipermail/python-dev/2015-February/138116.html [3] John Wong --
Well looking at just list a + b yields new list a += b yields modified a then there is also .extend in list. etc. so do we want to follow list's footstep? I like + because + is more natural to read. Maybe this needs to be a separate thread. I am actually amazed to remember dict + dict is not possible... there must be a reason (performance??) for this...
Cheers,
~ Ian Lee _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed Feb 11 2015 at 2:21:59 AM Ian Lee <ianlee1521@gmail.com> wrote:
I mentioned this on the python-dev list [1] originally as a +1 to someone else suggesting the idea [2]. It also came up in a response to my post that I can't seem to find in the archives, so I've quoted it below [3].
As the subject says, the idea would be to add a "+" and "+=" operator to dict that would provide the following behavior:
{'x': 1, 'y': 2} + {'z': 3} {'x': 1, 'y': 2, 'z': 3}
With the only potentially non obvious case I can see then is when there are duplicate keys, in which case the syntax could just be defined that last setter wins, e.g.:
{'x': 1, 'y': 2} + {'x': 3} {'x': 3, 'y': 2}
Which is analogous to the example:
new_dict = dict1.copy() new_dict.update(dict2)
With "+=" then essentially ending up being an alias for ``dict.update(...)``.
I'd be happy to champion this as a PEP if the feedback / public opinion heads in that direction.
Unless Guido wants to speak up and give some historical context, I think a PEP that finally settled this idea would be good, even if it just ends up being a historical document as to why dicts don't have an __add__ method. Obviously there is a good amount of support both for and against the idea. -Brett
[1] https://mail.python.org/pipermail/python-dev/2015-February/138150.html [2] https://mail.python.org/pipermail/python-dev/2015-February/138116.html [3] John Wong --
Well looking at just list a + b yields new list a += b yields modified a then there is also .extend in list. etc. so do we want to follow list's footstep? I like + because + is more natural to read. Maybe this needs to be a separate thread. I am actually amazed to remember dict + dict is not possible... there must be a reason (performance??) for this...
Cheers,
~ Ian Lee _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Feb 13, 2015 at 10:38 AM, Brett Cannon <brett@python.org> wrote:
Unless Guido wants to speak up and give some historical context, I think a PEP that finally settled this idea would be good, even if it just ends up being a historical document as to why dicts don't have an __add__ method. Obviously there is a good amount of support both for and against the idea.
+1 -eric
participants (37)
-
Alexander Belopolsky
-
Alexander Heger
-
Andrew Barnert
-
Brett Cannon
-
C Anthony Risinger
-
Carl Meyer
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Donald Stufft
-
Eric Snow
-
Erik Bray
-
Ethan Furman
-
Florian Bruhin
-
Greg
-
Greg Ewing
-
Ian Lee
-
Joshua Landau
-
Juancarlo Añez
-
M.-A. Lemburg
-
Mark Lawrence
-
Mark Young
-
MRAB
-
Nathan Schneider
-
Neil Girdhar
-
Nikolaus Rath
-
Paul Moore
-
Peter Mawhorter
-
Petr Viktorin
-
random832@fastmail.us
-
Rob Cliffe
-
Ron Adam
-
Skip Montanaro
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Thomas Kluyver