[Python-ideas] Adding "+" and "+=" operators to dict

Steven D'Aprano steve at pearwood.info
Sat Feb 14 04:05:37 CET 2015


On Sat, Feb 14, 2015 at 11:12:10AM +1300, Greg Ewing wrote:
> Stefan Behnel wrote:
> >Arithmetic expressions are always
> >evaluated from left to right. It thus seems obvious to me what gets created
> >first, and what gets added afterwards (and overwrites what's there).
> 
> I'm okay with "added afterwards", but the "overwrites"
> part is *not* obvious to me.
[snip yet another shopping list example]

I think that focusing on the question of "which wins" in the case of 
duplicate keys, the left or right operand, is not very productive. We 
can just declare that Python semantics are that the last value seen 
wins, which is the current behaviour for dict.update.

Subclasses can override that, if they wish, but the easiest way is to 
just swap the order of the operands. Instead of a.update(b), use 
b.update(a). (If you're worried about contaminating other references to 
b, make a copy of it first. Whatever.)

dict.update does what it does very well. It solves 90% of the "update in 
place" problems. Specialist subclasses, like a multiset or a Counter, 
can solve the other 90% *wink*. What these requests for "dict addition" 
are really asking about is an easy way to solve the "copy and update" 
problem. There are a few answers:


(1) Not everything needs to be a one-liner or an expression. Just copy 
and update yourself, it's not hard.

(2) Write your own helper function. It's a three line function. Not 
everything needs to be a built-in:

    def copy_and_update(a, b):
        new = a.copy()
        new.update(b)
        return new

Add bells and whistles to taste.

(3) Some people think it should be an operator. There are strong 
feelings about which operator: + | ^ << have all been suggested. 
Operators have some disadvantages: they can only take two arguments, so 
copy-and-updating a series of dicts means making repeated copies which 
are thrown away.

There's the question of different types -- if you merge a SpamMapping 
and an EggsMapping and a CheeseMapping, what result do you get? If you 
care about the specific type, do you have to make yet another copy at 
the end to ensure you get the type you want?

    SpamMapping(a + b + c)  # or a | b | c

We shouldn't care about small efficiencies, say, 90% of the time, but 
this is starting to look a little worrisome, since it may lead to near 
quadratic behaviour.

Using an operator means that you get augmented assignment for free, even 
if you don't define an __iadd__ or __ior__ method. But augmented 
assignment isn't entirely problem-free. If your mapping is embedded in 
an immutable data structure, then:

    structure.mapping |= other  # or += if you prefer

may *succeed and yet raise an exception*. That's a nasty language wart. 
To me, that's a good reason to look for an alternative.

(4) So perhaps a method on Mapping is a better idea. That makes the 
return type obvious: it should be type(self). It allows implementations 
to be efficient in the face of multiple arguments, by avoiding the 
creation of temporary objects which are then copied and thrown away. 
Being a method, they can take keyword arguments too.

But the obvious name, updated(), is uncomfortably close to update() and 
perhaps easily confused with it. The next most obvious name, 
copy_and_update(), is a little long for my tastes.


(5) How about a function? It need not be a built-in, although there is 
precedence with sorted() and reversed(). It could be collections.updated().

With a function, it's a little less obvious what the return type should 
be. Perhaps the rule should be, if the first argument is a Mapping, use 
the type of that first argument. Otherwise use dict.

(6) Or we can use the Mapping constructor. We're already part way there: 

py> dict({'a': 1, 'b': 2, 'c': 3}, a=9999)
{'c': 3, 'b': 2, 'a': 9999}


The constuctor makes a copy of the argument, and updates it with any 
keyword args. If it would take multiple arguments, that's the 
copy-and-update semantics we're after.

The downside is that a few mappings -- defaultdict immediately comes to 
mind -- have a constuctor with a radically different signature. So you 
can't say:

    defaultdict(a, b, c, d)

(Personally, I think that changing the constructor signature like that 
is a mistake. But I'm not sure what alternatives there are.)


My sense of this is that using the constructor is the right solution, 
and for mappings with unusual signatures, consenting adults applies. For 
them, you have to do it the old-fashioned way:

    d = defaultdict(func)
    d.update(a, b, c, d)


I don't care about solving this for every obscure mapping type in the 
universe. Even obscure mapping types in the standard library :-)


-- 
Steve


More information about the Python-ideas mailing list