[Python-ideas] PEP: Dict addition and subtraction

Steven D'Aprano steve at pearwood.info
Tue Mar 5 18:14:53 EST 2019


On Sun, Mar 03, 2019 at 09:28:30PM -0500, James Lu wrote:

> I propose that the + sign merge two python dictionaries such that if 
> there are conflicting keys, a KeyError is thrown.

This proposal is for a simple, operator-based equivalent to 
dict.update() which returns a new dict. dict.update has existed since 
Python 1.5 (something like a quarter of a century!) and never grown a 
"unique keys" version.

I don't recall even seeing a request for such a feature. If such a 
unique keys version is useful, I don't expect it will be useful often.


> This way, d1 + d2 isn’t just another obvious way to do {**d1, **d2}.

One of the reasons for preferring + is that it is an obvious way to do 
something very common, while {**d1, **d2} is as far from obvious as you 
can get without becoming APL or Perl :-)

If I needed such a unique key version of update, I'd use a subclass:


class StrictDict(dict):
    def __add__(self, other):
        if isinstance(other, dict) and (self.keys() & other.keys()):
            raise KeyError('non-unique keys')
        return super().__add__(self, other)

    # and similar for __radd__.


rather than burden the entire language, and every user of it, with 
having to learn the subtle difference between the obvious + operator and 
the error-prone and unobvious trick of {*d1, *d2}.

( Did you see what I did there? *wink* )


> The second syntax makes it clear that a new dictionary is being 
> constructed and that d2 overrides keys from d1.

Only because you have learned the rule that {**d, **e) means to 
construct a new dict by merging, with the rule that in the event of 
duplicate keys, the last key seen wins. If you hadn't learned that rule, 
there is nothing in the syntax which would tell you the behaviour. We 
could have chosen any rule we liked:

- raise an exception, like you get a TypeError if you pass the 
  same keyword argument to a function twice: spam(foo=1, foo=2);

- first value seen wins;

- last value seen wins;

- random value wins;

- anything else we liked!


There is nothing "clear" about the syntax which makes it obvious which 
behaviour is implemented. We have to learn it.



> One can reasonably expect or imagine a situation where a section of 
> code that expects to merge two dictionaries with non-conflicting keys 
> commits a semantic error if it merges two dictionaries with 
> conflicting keys.

I can imagine it, but I don't think I've ever needed it, and I can't 
imagine wanting it often enough to wish it was not just a built-in 
function or method, but actual syntax.

Do you have some real examples of wanting an error when trying to update 
a dict if keys match?


> To better explain, imagine a program where options is a global 
> variable storing parsed values from the command line.
> 
> def verbose_options():
>  if options.quiet
>      return {'verbose': True}
> 
> def quiet_options():
>  if options.quiet:
>      return {'verbose': False}

That seems very artifical to me. Why not use a single function:

def verbose_options():  # There's more than one?
    return {'verbose': not options.quiet}

The way you have written those functions seems weird to me. You already 
have a nice options object, with named fields like "options.quiet", why 
are you turning it into not one but *two* different dicts, both 
reporting the same field?

And its buggy: if options.quiet is True, then the key 'quiet' 
should be True, not the 'verbose' key.

Do you have *two* functions for every preference setting that takes a 
true/false flag?

What do you do for preference settings that take multiple values? Create 
a vast number of specialised functions, one for each possible value?

def A4_page_options():
    if options.page_size == 'A4':
        return {'page_size': 'A4'}

def US_Letter_page_options():
    if options.page_size == 'US Letter':
        return {'page_size': 'US Letter'}

page_size = (
            A4_page_options() + A3_page_options() + A5_page_options()
            + Foolscape_page_options + Tabloid_page_options()
            + US_Letter_page_options() + US_Legal_page_options()
            # and about a dozen more...
            )

The point is, although I might be wrong, I don't think that this example 
is a practical, realistic use-case for a unique keys version of update.

To me, your approach seems so complicated and artificial that it seems 
like it was invented specifically to justify this "unique key" operator, 
not something that we would want to write in real life.

But even if it real code, the question is not whether it is EVER useful 
for a dict update to raise an exception on matching keys. The question 
is whether this is so often useful that this is the behaviour we want to 
make the default for dicts.


[...]
> Again, I propose that the + sign merge two python dictionaries such 
> that if there are conflicting keys, a KeyError is thrown, because such 
> “non-conflicting merge” behavior would be useful in Python.

I don't think it would be, at least not often.

If it were common enough to justify a built-in operator to do this, we 
would have had many requests for a dict.unique_update or similar by now, 
and I don't think we have.


> It gives 
> clarifying power to the + sign. The + and the {**, **} should serve 
> different roles.
> 
> In other words, explicit + is better than implicit {**, **#, unless 
> explicitly suppressed.  Here + is explicit whereas {**, **} is 
> implicitly allowing inclusive keys, 

If I had a cent for every time people misused "explicit" to mean "the 
proposal that I like", I'd be rich.

In what way is the "+" operator *explicit* about raising an exception on 
duplicate keys? These are both explicit:

    merge_but_raise_exception_if_any_duplicates(d1, d2)

    merge(d1, d2, raise_if_duplicates=True)

and these are both equally implicit:

    d1 + d2

    {**d1, **d2}

since the behaviour on duplicates is not explicitly stated in clear and 
obvious language, but implied by the rules of the language.


[...]
> People expect the + operator to be commutative

THey are wrong to expect that, because the + operator is already not 
commutative for:

    str
    bytes
    bytearray
    list
    tuple
    array.array
    collections.deque
    collections.Counter

and possibly others.




-- 
Steven


More information about the Python-ideas mailing list