[Python-ideas] Adding "+" and "+=" operators to dict

Sat Feb 14 04:37:30 CET 2015

On Feb 13, 2015, at 19:05, Steven D'Aprano <steve at pearwood.info> wrote:

> On Sat, Feb 14, 2015 at 11:12:10AM +1300, Greg Ewing wrote:
>> Stefan Behnel wrote:
>>> Arithmetic expressions are always
>>> evaluated from left to right. It thus seems obvious to me what gets created
>>> first, and what gets added afterwards (and overwrites what's there).
>> 
>> I'm okay with "added afterwards", but the "overwrites"
>> part is *not* obvious to me.
> [snip yet another shopping list example]
> 
> I think that focusing on the question of "which wins" in the case of 
> duplicate keys, the left or right operand, is not very productive. We 
> can just declare that Python semantics are that the last value seen 
> wins, which is the current behaviour for dict.update.
> 
> Subclasses can override that, if they wish, but the easiest way is to 
> just swap the order of the operands. Instead of a.update(b), use 
> b.update(a). (If you're worried about contaminating other references to 
> b, make a copy of it first. Whatever.)
> 
> dict.update does what it does very well. It solves 90% of the "update in 
> place" problems. Specialist subclasses, like a multiset or a Counter, 
> can solve the other 90% *wink*. What these requests for "dict addition" 
> are really asking about is an easy way to solve the "copy and update" 
> problem. There are a few answers:
> 
> 
> (1) Not everything needs to be a one-liner or an expression. Just copy 
> and update yourself, it's not hard.
> 
> (2) Write your own helper function. It's a three line function. Not 
> everything needs to be a built-in:
> 
>    def copy_and_update(a, b):
>        new = a.copy()
>        new.update(b)
>        return new
> 
> Add bells and whistles to taste.
> 
> (3) Some people think it should be an operator. There are strong 
> feelings about which operator: + | ^ << have all been suggested. 
> Operators have some disadvantages: they can only take two arguments, so 
> copy-and-updating a series of dicts means making repeated copies which 
> are thrown away.
> 
> There's the question of different types -- if you merge a SpamMapping 
> and an EggsMapping and a CheeseMapping, what result do you get? If you 
> care about the specific type, do you have to make yet another copy at 
> the end to ensure you get the type you want?
> 
>    SpamMapping(a + b + c)  # or a | b | c
> 
> We shouldn't care about small efficiencies, say, 90% of the time, but 
> this is starting to look a little worrisome, since it may lead to near 
> quadratic behaviour.
> 
> Using an operator means that you get augmented assignment for free, even 
> if you don't define an __iadd__ or __ior__ method. But augmented 
> assignment isn't entirely problem-free. If your mapping is embedded in 
> an immutable data structure, then:
> 
>    structure.mapping |= other  # or += if you prefer
> 
> may *succeed and yet raise an exception*. That's a nasty language wart. 
> To me, that's a good reason to look for an alternative.
> 
> (4) So perhaps a method on Mapping is a better idea. That makes the 
> return type obvious: it should be type(self). It allows implementations 
> to be efficient in the face of multiple arguments, by avoiding the 
> creation of temporary objects which are then copied and thrown away. 
> Being a method, they can take keyword arguments too.
> 
> But the obvious name, updated(), is uncomfortably close to update() and 
> perhaps easily confused with it. The next most obvious name, 
> copy_and_update(), is a little long for my tastes.
> 
> 
> (5) How about a function? It need not be a built-in, although there is 
> precedence with sorted() and reversed(). It could be collections.updated().
> 
> With a function, it's a little less obvious what the return type should 
> be. Perhaps the rule should be, if the first argument is a Mapping, use 
> the type of that first argument. Otherwise use dict.
> 
> (6) Or we can use the Mapping constructor. We're already part way there: 
> 
> py> dict({'a': 1, 'b': 2, 'c': 3}, a=9999)
> {'c': 3, 'b': 2, 'a': 9999}
> 
> 
> The constuctor makes a copy of the argument, and updates it with any 
> keyword args. If it would take multiple arguments, that's the 
> copy-and-update semantics we're after.
> 
> The downside is that a few mappings -- defaultdict immediately comes to 
> mind -- have a constuctor with a radically different signature. So you 
> can't say:
> 
>    defaultdict(a, b, c, d)

Actually, that isn't a problem. defaultdict doesn't have a radically different signature at all, it just has one special parameter, followed by whatever other stuff dict wants. The docs make it clear that whatever's there will be treated exactly the same as dict, and both the C implementation and the old Python-equivalent recipe both do this by effectively doing self.default_factory = args[0]; super().__init__(*args[1:], **kwargs).

So there is no real problem at all with this suggestion. Which is why it's now my favorite of the bunch. Even if we get the generalized unpacking (which has other merits), I'd still like this one.

Of course it only fixes dict and a very small handful of classes (including defaultdict, but not much else) that inherit or encapsulate a dict and delegate blindly. Every other class--OrderedDict and UserDict, all the third-party mappings, etc.--have to be changed as well or they don't get the change. (And the Mapping mixin can't help here, as the collections.abc classes don't provide any construction behavior at all.)

But that's fine. Implementing (Mutable)Mapping doesn't guarantee that you are 100% like a dict, only that you're a (mutable) mapping. (For example, you already don't get a copy method.) If some mappings can be constructed from multiple mapping-or-iterator arguments and some can't, so what?

That being said, I think UserDict _does_ need to be fixed as a special case, because its two purposes are (a) to exactly simulate dict as far as possible, and (b) to serve as sample code for people writing their own mappings.

And since OrderedDict is pretty trivial to change, I'd probably do that too. But I wouldn't hunt through the stdlib looking for any special-purpose mappings, or add code that tries to warn on non-compliant third-party mappings, or even add anything to the documentation about the expected constructor signature (we don't currently explain how to take a single mapping or iterable-of-pairs plus keywords, or similarly for any other collections constructors, except the special case of Set._from_iterable, and only because that's needed to make set operators work).

However, if you want update too (as you seem to, given your defaultdict suggestion), I _would_ change the MutableMapping.update default implementation (which automatically fixes the other collections mappings that aren't fixed by dict). That seems like a small gain for a minuscule cost, so why not?

> (Personally, I think that changing the constructor signature like that 
> is a mistake. But I'm not sure what alternatives there are.)
> 
> 
> My sense of this is that using the constructor is the right solution, 
> and for mappings with unusual signatures, consenting adults applies. For 
> them, you have to do it the old-fashioned way:
> 
>    d = defaultdict(func)
>    d.update(a, b, c, d)

But d = defaultdict(func, a, b, c, d) will already just work.

And if you want to preserve the first (or last, or whatever) one's default factory, that's trivial to do, and explicit without being horribly verbose: d = defaultdict(a.default_factory, a, b, c, d).

> I don't care about solving this for every obscure mapping type in the 
> universe. Even obscure mapping types in the standard library :-)
> 
> 
> -- 
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/