On Tue, Mar 05, 2019 at 08:11:29AM -0500, David Shawley wrote:
"Putting Metaclasses to Work" (ISBN-13 978-0201433050) presents a more mathematical view of programming language types that includes two distinct operations for combining dictionaries -- merge and recursive merge.
For two input dictionaries D1 & D2 and the output dictionary O
D1 merge D2 O is D1 with the of those keys of D2 that do not have keys in D1
D1 recursive-merge D2 For all keys k, O[k] = D1[k] recursive merge D2[k] if both D1[k] and D2[k] are dictionaries, otherwise O[k] = (D1 merge D2)[k].
I'm afraid I cannot understand either of those algorithms as written. I suspect that you've left at least one word out of the first. Fortunately your example below is extremely clear, thank you. [...]
The following example uses dictionaries from "Putting Metaclasses to Work":
d1 = { ... 'title': 'Structured Programming', ... 'authors': 'Dahl, Dijkstra, and Hoare', ... 'locations': { ... 'Dahl': 'University of Oslo', ... 'Dijkstra': 'University of Texas', ... 'Hoare': 'Oxford University', ... }, ... }
d2 = { ... 'publisher': 'Academic Press', ... 'locations': { ... 'North America': 'New York', ... 'Europe': 'London', ... }, ... }
o = d1.copy() o.update(d2) o {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
Yes, that's the classic "update with last seen wins". That's what the PEP proposes as that seems to be the most frequently requested behaviour. It is also the only behaviour which has been deemed useful enough in nearly 30 years of Python's history to be added to dict as a method.
merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
That seems to be "update with first seen wins", which is easily done using ChainMap or the proposed dict difference operator: dict( ChainMap(d1, d2) ) # or d1 + (d2 - d1) or simply by swapping the order of the operands: d2 + d1 (These are not *identical* in effect, there are small differences with respect to key:value identity, and order of keys. But they ought to give *equal* results.) Personally, I don't think that behaviour is as useful as the first, but it is certainly a legitimate kind of merge. As far as I know, this has never been requested before. Perhaps it is too niche?
recursive_merge(d1, d2) {'publisher': 'Academic Press', 'title': 'Structured Programming', 'locations': {'North America': 'New York', 'Europe': 'London', 'Dijkstra': 'University of Texas', 'Hoare': 'Oxford University', 'Dahl': 'University of Oslo'}, 'authors': 'Dahl, Dijkstra, and Hoare'}
That's an interesting one. I'd write it something like this: def merge(a, b): new = a.copy() for key, value in b: if key not in a: # Add new keys. new[key] = value else: v = new[key] if isinstance(value, dict) and isinstance(v, dict): # If both values are dicts, merge them. new[key] = merge(v, value) else: # What to do if only one is a dict? # Or if neither is a dict? return new I've seen variants of this where duplicate keys are handled by building a list of the values: def merge(a, b): new = a.copy() for key, value in b: if key in a: v = new[key] if isinstance(v, list): v.append(value) else: new[key] = [v, value] ... or by concatenating values, or adding them (as Counter does), etc. We have subclasses and operator overloading, so you can implement whatever behaviour you like. The question is, is this behaviour useful enough and common enough to be built into dict itself?
IMO, having more than one obvious outcome means that we should refuse the temptation to guess.
We're not *guessing*. We're *chosing* which behaviour we want. Nobody says: When I print some strings, I can seperate them with spaces, or dots, or newlines, and print a newline at the end, or suppress the newline. Since all of these behaviours might be useful for somebody, we should not "guess" what the user wants. Therefore we should not have a print() function at all. The behaviour of print() is not a guess as to what the user wants. We offer a specific behaviour, and if the user is happy with that, then they can use print(), and if not, they can write their own. The same applies here: we're offering one specific behaviour that we think is the most important, and anyone who wants another can write their own. If people don't like my choice of what I think is the most important (copy-and-update, with last seen wins), they can argue for whichever alternative they like. If they make a convincing enough case, the PEP can change :-) James Lu has already tried to argue that the "raise on non-unique keys" is the best behaviour. I have disagreed with that, but if James makes a strong enough case for his idea, and it gains sufficient support, I could be persuaded to change my position. Or he can write a competing PEP and the Steering Council can decide between the two ideas.
If we do, then the result is only obvious to a subset of users and will be a surprise to the others.
Its only a surprise to those users who don't read the docs and make assumptions about behaviour based on their own wild guesses. We should get away from the idea that the only behaviours we can provide are those which are "obvious" (intuitive?) to people who guess what it means without reading the docs. It's great when a function's meaning can be guessed or inferred from a basic understanding of English: len(string) # assuming len is an abbreviation for length but that sets the bar impossibly high. We can't guess what these do, not with any precision: print(spam, eggs) # prints spaces between arguments or not? spam is eggs # that's another way of spelling == right? zip(spam, eggs) # what does it do if args aren't the same length? and who can guess what these do without reading the docs? property, classmethod, slice, enumerate, iter I don't think that Python is a worse language for having specified a meaning for these rather than leaving them out. The Zen's prohibition against guessing in the face of ambiguity does not mean that we must not add a feature to the language that requires the user to learn what it does first.
It's also useful to note that I am having trouble coming up with another programming language that supports a "+" operator for map types.
Does anyone have an example of another programming language that allows for addition of dictionaries/mappings?
If so, what is the behavior there?
An excellent example, but my browser just crashed and it's after 3am here so I'm going to take this opportunity to go to bed :-) -- Steven