[Python-ideas] Dict joining using + and +=

Tue Mar 5 04:59:08 EST 2019

On Mon, Mar 04, 2019 at 03:33:36PM -0500, Neil Girdhar wrote:

> Maybe, but reading through the various replies, it seems that if you
> are adding "-" to be analogous to set difference, then the combination
> operator should be analogous to set union "|". 

That's the purpose of this discussion, to decide whether dict merging is 
more like addition/concatenation or union :-)

> And it also opens an
> opportunity to add set intersection "&".

What should intersection do in the case of matching keys?

I see the merge + operator as a kind of update, whether it makes a copy 
or does it in place, so to me it is obvious that "last seen wins" should 
apply just as it does for the update method.

But dict *intersection* is a more abstract operation than merge/update. 
And that leads to the problem, what do you do with the values?

    {key: "spam"} & {key: "eggs"}

    # could result in any of:

    {key: "spam"}
    {key: "eggs"}
    {key: ("spam", "eggs")}
    {key: "spameggs"}
    an exception
    something else?

Unlike "update", I don't have any good use-cases to prefer any one of 
those over the others.

> After all, how do you filter a dictionary to a set of keys?
> 
> >> d = {'some': 5, 'extra': 10, 'things': 55}
> >> d &= {'some', 'allowed', 'options'}
> >> d
> {'some': 5}

    new = d - (d - allowed)

    {k:v for (k,v) in d if k in allowed}

> >> > * Regarding how to construct the new set in __add__, I now think this should be done like this:
> >> >
> >> > class dict:
> >> >     <other methods>
> >> >     def __add__(self, other):
> >> >         <checks that other makes sense, else return NotImplemented>
> >> >         new = self.copy()  # A subclass may or may not choose to override
> >> >         new.update(other)
> >> >         return new
> >>
> >> I like that, but it would be inefficient to do that for __sub__ since
> >> it would create elements that it might later delete.
> >>
> >> def __sub__(self, other):
> >>  new = self.copy()
> >>  for k in other:
> >>   del new[k]
> >> return new
> >>
> >> is less efficient than
> >>
> >> def __sub__(self, other):
> >>  return type(self)({k: v for k, v in self.items() if k not in other})

I don't think you should be claiming what is more or less efficient 
unless you've actually profiled them for speed and memory use. Often, 
but not always, the two are in opposition: we make things faster by 
using more memory, and save memory at the cost of speed.

Your version of __sub__ creates a temporary dict, which then has to be 
copied in order to preserve the type. Its not obvious to me that that's 
faster or more memory efficient than building a dict then deleting keys.

(Remember that dicts aren't lists, and deleting keys is an O(1) 
operation.)

-- 
Steven