[Python-ideas] Dict joining using + and +=
Steven D'Aprano
steve at pearwood.info
Tue Mar 5 04:59:08 EST 2019
On Mon, Mar 04, 2019 at 03:33:36PM -0500, Neil Girdhar wrote:
> Maybe, but reading through the various replies, it seems that if you
> are adding "-" to be analogous to set difference, then the combination
> operator should be analogous to set union "|".
That's the purpose of this discussion, to decide whether dict merging is
more like addition/concatenation or union :-)
> And it also opens an
> opportunity to add set intersection "&".
What should intersection do in the case of matching keys?
I see the merge + operator as a kind of update, whether it makes a copy
or does it in place, so to me it is obvious that "last seen wins" should
apply just as it does for the update method.
But dict *intersection* is a more abstract operation than merge/update.
And that leads to the problem, what do you do with the values?
{key: "spam"} & {key: "eggs"}
# could result in any of:
{key: "spam"}
{key: "eggs"}
{key: ("spam", "eggs")}
{key: "spameggs"}
an exception
something else?
Unlike "update", I don't have any good use-cases to prefer any one of
those over the others.
> After all, how do you filter a dictionary to a set of keys?
>
> >> d = {'some': 5, 'extra': 10, 'things': 55}
> >> d &= {'some', 'allowed', 'options'}
> >> d
> {'some': 5}
new = d - (d - allowed)
{k:v for (k,v) in d if k in allowed}
> >> > * Regarding how to construct the new set in __add__, I now think this should be done like this:
> >> >
> >> > class dict:
> >> > <other methods>
> >> > def __add__(self, other):
> >> > <checks that other makes sense, else return NotImplemented>
> >> > new = self.copy() # A subclass may or may not choose to override
> >> > new.update(other)
> >> > return new
> >>
> >> I like that, but it would be inefficient to do that for __sub__ since
> >> it would create elements that it might later delete.
> >>
> >> def __sub__(self, other):
> >> new = self.copy()
> >> for k in other:
> >> del new[k]
> >> return new
> >>
> >> is less efficient than
> >>
> >> def __sub__(self, other):
> >> return type(self)({k: v for k, v in self.items() if k not in other})
I don't think you should be claiming what is more or less efficient
unless you've actually profiled them for speed and memory use. Often,
but not always, the two are in opposition: we make things faster by
using more memory, and save memory at the cost of speed.
Your version of __sub__ creates a temporary dict, which then has to be
copied in order to preserve the type. Its not obvious to me that that's
faster or more memory efficient than building a dict then deleting keys.
(Remember that dicts aren't lists, and deleting keys is an O(1)
operation.)
--
Steven
More information about the Python-ideas
mailing list