[Python-ideas] PEP: Dict addition and subtraction

Tue Mar 5 11:11:01 EST 2019

On Tue, Mar 05, 2019 at 08:11:29AM -0500, David Shawley wrote:

> "Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more
> mathematical view of programming language types that includes two
> distinct operations for combining dictionaries -- merge and recursive
> merge.
> 
> For two input dictionaries D1 & D2 and the output dictionary O
> 
> D1 merge D2
>     O is D1 with the of those keys of D2 that do not have keys in D1
>
> D1 recursive-merge D2
>     For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k]
>     and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].

I'm afraid I cannot understand either of those algorithms as written. I 
suspect that you've left at least one word out of the first.

Fortunately your example below is extremely clear, thank you.

[...]
> The following example uses dictionaries from "Putting
> Metaclasses to Work":
> 
> >>> d1 = {
> ...     'title': 'Structured Programming',
> ...     'authors': 'Dahl, Dijkstra, and Hoare',
> ...     'locations': {
> ...         'Dahl': 'University of Oslo',
> ...         'Dijkstra': 'University of Texas',
> ...         'Hoare': 'Oxford University',
> ...     },
> ... }
> >>>
> >>> d2 = {
> ...     'publisher': 'Academic Press',
> ...     'locations': {
> ...         'North America': 'New York',
> ...         'Europe': 'London',
> ...     },
> ... }
> >>>
> >>> o = d1.copy()
> >>> o.update(d2)
> >>> o
> {'publisher': 'Academic Press',
>  'title': 'Structured Programming',
>  'locations': {'North America': 'New York', 'Europe': 'London'},
>  'authors': 'Dahl, Dijkstra, and Hoare'}

Yes, that's the classic "update with last seen wins". That's what the 
PEP proposes as that seems to be the most frequently requested 
behaviour. It is also the only behaviour which has been deemed useful 
enough in nearly 30 years of Python's history to be added to dict as a 
method.

> >>> merge(d1, d2)
> {'publisher': 'Academic Press',
>  'title': 'Structured Programming',
>  'locations': {'Dijkstra': 'University of Texas',
>                'Hoare': 'Oxford University',
>                'Dahl': 'University of Oslo'},
>  'authors': 'Dahl, Dijkstra, and Hoare'}

That seems to be "update with first seen wins", which is easily done 
using ChainMap or the proposed dict difference operator:

    dict( ChainMap(d1, d2) )
    # or
    d1 + (d2 - d1)

or simply by swapping the order of the operands:

    d2 + d1

(These are not *identical* in effect, there are small differences with 
respect to key:value identity, and order of keys. But they ought to give 
*equal* results.)

Personally, I don't think that behaviour is as useful as the first, but 
it is certainly a legitimate kind of merge.

As far as I know, this has never been requested before. Perhaps it is 
too niche?

> >>> recursive_merge(d1, d2)
> {'publisher': 'Academic Press',
>  'title': 'Structured Programming',
>  'locations': {'North America': 'New York',
>                'Europe': 'London',
>                'Dijkstra': 'University of Texas',
>                'Hoare': 'Oxford University',
>                'Dahl': 'University of Oslo'},
>  'authors': 'Dahl, Dijkstra, and Hoare'}

That's an interesting one. I'd write it something like this:

def merge(a, b):
    new = a.copy()
    for key, value in b:
        if key not in a:
            # Add new keys.
            new[key] = value
        else:
            v = new[key]
            if isinstance(value, dict) and isinstance(v, dict):
                # If both values are dicts, merge them.
                new[key] = merge(v, value)
            else:
                # What to do if only one is a dict?
                # Or if neither is a dict?
    return new

I've seen variants of this where duplicate keys are handled by building 
a list of the values:

def merge(a, b):
    new = a.copy()
    for key, value in b:
        if key in a:
            v = new[key]
            if isinstance(v, list):
                v.append(value)
            else:
                new[key] = [v, value]
        ...

or by concatenating values, or adding them (as Counter does), etc. We 
have subclasses and operator overloading, so you can implement whatever 
behaviour you like.

The question is, is this behaviour useful enough and common enough to be 
built into dict itself?

> IMO, having more than one obvious outcome means that we should refuse
> the temptation to guess.

We're not *guessing*. We're *chosing* which behaviour we want.

Nobody says:

    When I print some strings, I can seperate them with spaces, or 
    dots, or newlines, and print a newline at the end, or suppress 
    the newline. Since all of these behaviours might be useful for 
    somebody, we should not "guess" what the user wants. Therefore 
    we should not have a print() function at all.

The behaviour of print() is not a guess as to what the user wants. We 
offer a specific behaviour, and if the user is happy with that, then 
they can use print(), and if not, they can write their own.

The same applies here: we're offering one specific behaviour that we 
think is the most important, and anyone who wants another can write 
their own.

If people don't like my choice of what I think is the most important 
(copy-and-update, with last seen wins), they can argue for whichever 
alternative they like. If they make a convincing enough case, the PEP 
can change :-)

James Lu has already tried to argue that the "raise on non-unique keys" 
is the best behaviour. I have disagreed with that, but if James makes a 
strong enough case for his idea, and it gains sufficient support, I 
could be persuaded to change my position.

Or he can write a competing PEP and the Steering Council can decide 
between the two ideas.

> If we do, then the result is only obvious
> to a subset of users and will be a surprise to the others.

Its only a surprise to those users who don't read the docs and make 
assumptions about behaviour based on their own wild guesses.

We should get away from the idea that the only behaviours we can provide 
are those which are "obvious" (intuitive?) to people who guess what it 
means without reading the docs. It's great when a function's meaning can 
be guessed or inferred from a basic understanding of English:

    len(string)  # assuming len is an abbreviation for length

but that sets the bar impossibly high. We can't guess what these do, not 
with any precision:

    print(spam, eggs)  # prints spaces between arguments or not?

    spam is eggs # that's another way of spelling == right?

    zip(spam, eggs)  # what does it do if args aren't the same length?

and who can guess what these do without reading the docs?

    property, classmethod, slice, enumerate, iter

I don't think that Python is a worse language for having specified a 
meaning for these rather than leaving them out.

The Zen's prohibition against guessing in the face of ambiguity does not 
mean that we must not add a feature to the language that requires the 
user to learn what it does first.

> It's also useful to note that I am having trouble coming up with
> another programming language that supports a "+" operator for map types.
> 
> Does anyone have an example of another programming language that
> allows for addition of dictionaries/mappings?
> 
> If so, what is the behavior there?

An excellent example, but my browser just crashed and it's after 3am 
here so I'm going to take this opportunity to go to bed :-)

-- 
Steven