On Sat, Mar 16, 2019 at 5:02 AM Gustavo Carneiro <gjcarneiro@gmail.com> wrote:
On Sat, 16 Mar 2019 at 10:33, Steven D'Aprano <steve@pearwood.info> wrote:
On Fri, Mar 15, 2019 at 10:53:31PM +0000, MRAB wrote:

> There was also the suggestion of having both << and >>.
>
> Actually, now that dicts are ordered, that would provide a use-case,
> because you would then be able to choose which values were overwritten
> whilst maintaining the order of the dict on the LHS.

Is that common enough that it needs to be built-in to dict itself?

If it is uncommon, then the conventional solution is to subclass dict,
overriding the merge operator to use first-seen semantics.

The question this PEP is trying to answer is not "can we support every
use-case imaginable for a merge operator?" but "can we support the most
typical use-case?", which I believe is a version of:

    new = a.copy()
    new.update(b)
    # do something with new

Already been said, but might have been forgotten, but the new proposed syntax:

    new = a + b

has to compete with the already existing syntax:

    new = {**a, **b}

The existing syntax is not exactly an operator in the mathematical sense (or is it?...), but my intuition is that it already triggers the visual processing part of the brain, similarly to operators.
 
The only argument for "a + b" in detriment of "{**a, **b}" is that "a + b" is more easy to discover, while not many programmers are familiar with "{**a, **b}".

I wonder if this is only a matter of time, and over time programmers will become more accustomed to "{**a, **b}", thereby reducing the relative benefit of  "a + b"?  Especially as more and more developers migrate code bases from Python 2 to Python 3...

FWIW, even as a core developer I had forgotten that the {**a, **b} syntax existed, thanks for the reminder! :)  But that's more likely because I rarely write code that needs to update and merge a dict or when i do it's still 2and3 compatible.

Antoine said:

> If "+" is added to dicts, then we're overloading an already heavily used operator.  It makes reading code more difficult.

This really resonated with me.  Reading code gives you a feel for what possible types something could be.  The set of possibilities for + is admittedly already quite large in Python.  But making an existing core type start supporting + reduces the information given to the reader by that one line of code.  They now have more possibilities to consider and must seek hints from more surrounding code.

For type inferencers such as us humans or tools like pytype, it means we need to consider which version Python's dict the code may be running under in order to infer what it may mean from the code's context.  For tooling, that's just a flag and a matter of conditionally changing code defining dict, but for humans they need to carry the possibility of that flag with them everywhere.

We should just promote the use of {**d1, **d2} syntax for anyone who wants an inline updated copy.

Why?

(1) It already exists.  (insert zen of python quote here)

(2) Copying via the + operator encourages inefficient code (already true for bytes/str/list).  A single + is fine.  But the natural human extension to that when people want to merge a bunch of things is to use a string of multiple operators, because that is how we're taught math.  No matter what type we're talking about, in Python this is an efficiency antipattern.

 z = a + b + c + d + e

That's four __add__ calls.  Each of which is a copy+update/extend/append operation for a dict/list/str respectively.

We already tell people not to do that with bytes and lists, instead using b''.join(a,b,c,d) or z = []; z.extend(X)... calls or something from itertools.

Given dict addition, it'd always be more efficient to join the "don't use tons of  + operators" club (a good lint warning) and write that as
 z = {**a, **b, **c, **d, **e}.

Unless the copy+update concept is an extremely common operation, having more than one way to do it feels like it'll cause more cognitive harm than good.

Now (2) could also be used as an argument that Python should detect chains of operators and allow those to be optimized.  That'd be a PEP of its own and is complicated to do; technically a semantic change given how dynamic Python is, as we do name lookups at time of use and each __add__ call could potentially have side effects changing the results of future name lookups (the a+b could change the meaning of c).  Yes, that is horrible and people writing code that does that deserve very bad things, but those are the semantics we'd be breaking if we tried to detect and support a mythical new construct like __chained_add__ being invoked when the types of all elements being added sequentially are identical (how identical? do subtypes count? see, complicated).

a + b + c:
       0 LOAD_GLOBAL              0 (a)
       2 LOAD_GLOBAL              1 (b)
       4 BINARY_ADD
       6 LOAD_GLOBAL              2 (c)
       8 BINARY_ADD

-gps