====================================== PEP-xxxx Dict addition and subtraction ====================================== **DRAFT** -- This is a draft document for discussion. Abstract -------- This PEP suggests adding merge ``+`` and difference ``-`` operators to the built-in ``dict`` class. The merge operator will have the same relationship to the ``dict.update`` method as the list concatenation operator has to ``list.extend``, with dict difference being defined analogously. Examples -------- Dict addition will return a new dict containing the left operand merged with the right operand. >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3} >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'} >>> d + e {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'} >>> e + d {'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2} The augmented assignment version operates in-place. >>> d += e >>> print(d) {'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'} Analogously with list addition, the operator version is more restrictive, and requires that both arguments are dicts, while the augmented assignment version allows anything the ``update`` method allows, such as iterables of key/value pairs. >>> d + [('spam', 999)] Traceback (most recent call last): ... TypeError: can only merge dict (not "list") to dict >>> d += [('spam', 999)] >>> print(d) {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'} Dict difference ``-`` will return a new dict containing the items from the left operand which are not in the right operand. >>> d = {'spam': 1, 'eggs': 2, 'cheese': 3} >>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'} >>> d - e {'spam': 1, 'eggs': 2} >>> e - d {'aardvark': 'Ethel'} Augmented assignment will operate in place. >>> d -= e >>> print(d) {'spam': 1, 'eggs': 2} Like the merge operator and list concatenation, the difference operator requires both operands to be dicts, while the augmented version allows any iterable of keys. >>> d - {'spam', 'parrot'} Traceback (most recent call last): ... TypeError: cannot take the difference of dict and set >>> d -= {'spam', 'parrot'} >>> print(d) {'eggs': 2, 'cheese': 'cheddar'} >>> d -= [('spam', 999)] >>> print(d) {'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'} Semantics --------- For the merge operator, if a key appears in both operands, the last-seen value (i.e. that from the right-hand operand) wins. This shows that dict addition is not commutative, in general ``d + e`` will not equal ``e + d``. This joins a number of other non-commutative addition operators among the builtins, including lists, tuples, strings and bytes. Having the last-seen value wins makes the merge operator match the semantics of the ``update`` method, so that ``d + e`` is an operator version of ``d.update(e)``. The error messages shown above are not part of the API, and may change at any time. Rejected semantics ~~~~~~~~~~~~~~~~~~ Rejected alternatives semantics for ``d + e`` include: - Add only new keys from ``e``, without overwriting existing keys in ``d``. This may be done by reversing the operands ``e + d``, or using dict difference first, ``d + (e - d)``. The later is especially useful for the in-place version ``d += (e - d)``. - Raise an exception if there are duplicate keys. This seems unnecessarily restrictive and is not likely to be useful in practice. For example, updating default configuration values with user-supplied values would most often fail under the requirement that keys are unique:: prefs = site_defaults + user_defaults + document_prefs - Add the values of d2 to the corresponding values of d1. This is the behaviour implemented by ``collections.Counter``. Syntax ------ An alternative to the ``+`` operator is the pipe ``|`` operator, which is used for set union. This suggestion did not receive much support on Python-Ideas. The ``+`` operator was strongly preferred on Python-Ideas.[1] It is more familiar than the pipe operator, matches nicely with ``-`` as a pair, and the Counter subclass already uses ``+`` for merging. Current Alternatives -------------------- To create a new dict containing the merged items of two (or more) dicts, one can currently write:: {**d1, **d2} but this is neither obvious nor easily discoverable. It is only guaranteed to work if the keys are all strings. If the keys are not strings, it currently works in CPython, but it may not work with other implementations, or future versions of CPython[2]. It is also limited to returning a built-in dict, not a subclass, unless re-written as ``MyDict(**d1, **d2)``, in which case non-string keys will raise TypeError. There is currently no way to perform dict subtraction except through a manual loop. Implementation -------------- The implementation will be in C. (The author of this PEP would like to make it known that he is not able to write the implemention.) An approximate pure-Python implementation of the merge operator will be:: def __add__(self, other): if isinstance(other, dict): new = type(self)() # May be a subclass of dict. new.update(self) new.update(other) return new return NotImplemented def __radd__(self, other): if isinstance(other, dict): new = type(other)() new.update(other) new.update(self) return new return NotImplemented Note that the result type will be the type of the left operand; in the event of matching keys, the winner is the right operand. Augmented assignment will just call the ``update`` method. This is analogous to the way ``list +=`` calls the ``extend`` method, which accepts any iterable, not just lists. def __iadd__(self, other): self.update(other) An approximate pure-Python implementation of the difference operator will be:: def __sub__(self, other): if isinstance(other, dict): new = type(self)() for k in self: if k not in other: new[k] = self[k] return new return NotImplemented def __rsub__(self, other): if isinstance(other, dict): new = type(other)() for k in other: if k not in self: new[k] = other[k] return new return NotImplemented Augmented assignment will operate on equivalent terms to ``update``. If the operand has a key method, it will be used, otherwise the operand will be iterated over:: def __isub__(self, other): if hasattr(other, 'keys'): for k in other.keys(): if k in self: del self[k] else: for k in other: if k in self: del self[k] These semantics are intended to match those of ``update`` as closely as possible. For the dict built-in itself, calling ``keys`` is redundant as iteration over a dict iterates over its keys; but for subclasses or other mappings, ``update`` prefers to use the keys method. .. attention:: The above paragraph may be inaccurate. Although the dict docstring states that ``keys`` will be called if it exists, this does not seem to be the case for dict subclasses. Bug or feature? Contra-indications ------------------ (Or when to avoid using these new operators.) For merging multiple dicts, the ``d1 + d2 + d3 + d4 + ...`` idiom will suffer from the same unfortunate O(N\*\*2) Big Oh performance as does list and tuple addition, and for similar reasons. If one expects to be merging a large number of dicts where performance is an issue, it may be better to use an explicit loop and in-place merging:: new = {} for d in many_dicts: new += d This is unlikely to be a problem in practice as most uses of the merge operator are expected to only involve a small number of dicts. Similarly, most uses of list and tuple concatenation only use a few objects. Using the dict augmented assignment operators on a dict inside a tuple (or other immutable data structure) will lead to the same problem that occurs with list concatenation[3], namely the in-place addition will succeed, but the operation will raise an exception. >>> a_tuple = ({'spam': 1, 'eggs': 2}, None) >>> a_tuple[0] += {'spam': 999} Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment >>> a_tuple[0] {'spam': 999, 'eggs': 2} Similar remarks apply to the ``-`` operator. Other discussions ----------------- `Latest discussion which motivated this PEP `_ `Ticket on the bug tracker `_ `A previous discussion `_ and `commentary on it `_. Note that the author of this PEP was skeptical of this proposal at the time. `How to merge dictionaries `_ in idiomatic Python. Open questions -------------- Should these operators be part of the ABC ``Mapping`` API? References ---------- [1] Guido's declaration that plus wins over pipe: https://mail.python.org/pipermail/python-ideas/2019-February/055519.html [2] Non-string keys: https://bugs.python.org/issue35105 and https://mail.python.org/pipermail/python-dev/2018-October/155435.html [3] Behaviour in tuples: https://docs.python.org/3/faq/programming.html#why-does-a-tuple-i-item-raise-an-exception-when-the-addition-works Copyright --------- This document has been placed in the public domain. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: