[Python-ideas] PEP: Dict addition and subtraction
Steven D'Aprano
steve at pearwood.info
Tue Mar 5 11:11:01 EST 2019
On Tue, Mar 05, 2019 at 08:11:29AM -0500, David Shawley wrote:
> "Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more
> mathematical view of programming language types that includes two
> distinct operations for combining dictionaries -- merge and recursive
> merge.
>
> For two input dictionaries D1 & D2 and the output dictionary O
>
> D1 merge D2
> O is D1 with the of those keys of D2 that do not have keys in D1
>
> D1 recursive-merge D2
> For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k]
> and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].
I'm afraid I cannot understand either of those algorithms as written. I
suspect that you've left at least one word out of the first.
Fortunately your example below is extremely clear, thank you.
[...]
> The following example uses dictionaries from "Putting
> Metaclasses to Work":
>
> >>> d1 = {
> ... 'title': 'Structured Programming',
> ... 'authors': 'Dahl, Dijkstra, and Hoare',
> ... 'locations': {
> ... 'Dahl': 'University of Oslo',
> ... 'Dijkstra': 'University of Texas',
> ... 'Hoare': 'Oxford University',
> ... },
> ... }
> >>>
> >>> d2 = {
> ... 'publisher': 'Academic Press',
> ... 'locations': {
> ... 'North America': 'New York',
> ... 'Europe': 'London',
> ... },
> ... }
> >>>
> >>> o = d1.copy()
> >>> o.update(d2)
> >>> o
> {'publisher': 'Academic Press',
> 'title': 'Structured Programming',
> 'locations': {'North America': 'New York', 'Europe': 'London'},
> 'authors': 'Dahl, Dijkstra, and Hoare'}
Yes, that's the classic "update with last seen wins". That's what the
PEP proposes as that seems to be the most frequently requested
behaviour. It is also the only behaviour which has been deemed useful
enough in nearly 30 years of Python's history to be added to dict as a
method.
> >>> merge(d1, d2)
> {'publisher': 'Academic Press',
> 'title': 'Structured Programming',
> 'locations': {'Dijkstra': 'University of Texas',
> 'Hoare': 'Oxford University',
> 'Dahl': 'University of Oslo'},
> 'authors': 'Dahl, Dijkstra, and Hoare'}
That seems to be "update with first seen wins", which is easily done
using ChainMap or the proposed dict difference operator:
dict( ChainMap(d1, d2) )
# or
d1 + (d2 - d1)
or simply by swapping the order of the operands:
d2 + d1
(These are not *identical* in effect, there are small differences with
respect to key:value identity, and order of keys. But they ought to give
*equal* results.)
Personally, I don't think that behaviour is as useful as the first, but
it is certainly a legitimate kind of merge.
As far as I know, this has never been requested before. Perhaps it is
too niche?
> >>> recursive_merge(d1, d2)
> {'publisher': 'Academic Press',
> 'title': 'Structured Programming',
> 'locations': {'North America': 'New York',
> 'Europe': 'London',
> 'Dijkstra': 'University of Texas',
> 'Hoare': 'Oxford University',
> 'Dahl': 'University of Oslo'},
> 'authors': 'Dahl, Dijkstra, and Hoare'}
That's an interesting one. I'd write it something like this:
def merge(a, b):
new = a.copy()
for key, value in b:
if key not in a:
# Add new keys.
new[key] = value
else:
v = new[key]
if isinstance(value, dict) and isinstance(v, dict):
# If both values are dicts, merge them.
new[key] = merge(v, value)
else:
# What to do if only one is a dict?
# Or if neither is a dict?
return new
I've seen variants of this where duplicate keys are handled by building
a list of the values:
def merge(a, b):
new = a.copy()
for key, value in b:
if key in a:
v = new[key]
if isinstance(v, list):
v.append(value)
else:
new[key] = [v, value]
...
or by concatenating values, or adding them (as Counter does), etc. We
have subclasses and operator overloading, so you can implement whatever
behaviour you like.
The question is, is this behaviour useful enough and common enough to be
built into dict itself?
> IMO, having more than one obvious outcome means that we should refuse
> the temptation to guess.
We're not *guessing*. We're *chosing* which behaviour we want.
Nobody says:
When I print some strings, I can seperate them with spaces, or
dots, or newlines, and print a newline at the end, or suppress
the newline. Since all of these behaviours might be useful for
somebody, we should not "guess" what the user wants. Therefore
we should not have a print() function at all.
The behaviour of print() is not a guess as to what the user wants. We
offer a specific behaviour, and if the user is happy with that, then
they can use print(), and if not, they can write their own.
The same applies here: we're offering one specific behaviour that we
think is the most important, and anyone who wants another can write
their own.
If people don't like my choice of what I think is the most important
(copy-and-update, with last seen wins), they can argue for whichever
alternative they like. If they make a convincing enough case, the PEP
can change :-)
James Lu has already tried to argue that the "raise on non-unique keys"
is the best behaviour. I have disagreed with that, but if James makes a
strong enough case for his idea, and it gains sufficient support, I
could be persuaded to change my position.
Or he can write a competing PEP and the Steering Council can decide
between the two ideas.
> If we do, then the result is only obvious
> to a subset of users and will be a surprise to the others.
Its only a surprise to those users who don't read the docs and make
assumptions about behaviour based on their own wild guesses.
We should get away from the idea that the only behaviours we can provide
are those which are "obvious" (intuitive?) to people who guess what it
means without reading the docs. It's great when a function's meaning can
be guessed or inferred from a basic understanding of English:
len(string) # assuming len is an abbreviation for length
but that sets the bar impossibly high. We can't guess what these do, not
with any precision:
print(spam, eggs) # prints spaces between arguments or not?
spam is eggs # that's another way of spelling == right?
zip(spam, eggs) # what does it do if args aren't the same length?
and who can guess what these do without reading the docs?
property, classmethod, slice, enumerate, iter
I don't think that Python is a worse language for having specified a
meaning for these rather than leaving them out.
The Zen's prohibition against guessing in the face of ambiguity does not
mean that we must not add a feature to the language that requires the
user to learn what it does first.
> It's also useful to note that I am having trouble coming up with
> another programming language that supports a "+" operator for map types.
>
> Does anyone have an example of another programming language that
> allows for addition of dictionaries/mappings?
>
> If so, what is the behavior there?
An excellent example, but my browser just crashed and it's after 3am
here so I'm going to take this opportunity to go to bed :-)
--
Steven
More information about the Python-ideas
mailing list