Let's start with a quick quiz: what is the result of each of the following (on Python3.5.x)?
{}.keys() | set() # 1
set() | [] # 2
{}.keys() | [] # 3
set().union(set()) # 4
set().union([]) # 5
{}.keys().union(set()) # 6
If your answer was set([]), TypeError, set([]), set([]), set([]), AttributeError, then you were correct. That, to me, is incredibly unintuitive.
Next up:
{}.keys() == {}.keys() # 7
{}.items() == {}.items() # 8
{}.values() == {}.values() # 9
d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up for debate.[1]
First thing first, the behavior exhibited by #3 is a bug (or at least it probably should be, and one I'd be happy to fix. However, before doing that I felt it would be good to propose some questions and suggestions for how that might be done.
There are, as far as I can tell, two reasons to use a MappingView, memory efficiency or auto-updating (a view will continue to mirror changes in the underlying object). I'll focus on the first because the second can conceivably be solved in other ways.
Currently, if I want to union a dictionaries keys with a non-set iterable, I can do `{}.keys() | []`. This is a bug[2], the set or operator should only work on another set. That said, fixing this bug removes the ability to efficiently or keys with another iterable, `set({}.keys()).update([])` for efficiency, or `set({}.keys()).union([])` for clarity.
Fixing this is simply a matter of adding a `.union` method to the KeysView (and possibly to the Set abc as well). Although that may not be something that is wanted. The issue, as far as I can tell, is whether we want people converting from MappingViews to "primitives" as soon as possible, or if we want to encourage people people to use the views for as long as possible.
There are arguments on both sides: encouraging people to use the views causes these objects to become more complex, introducing more surface area for bugs, more code to be maintained, etc. Further, there's there is currently one obvious way to do things, convert to a primitive, if you're doing any complex action. On the other hand, making MappingViews more like the primitives they represent has positives for performance, simplifies user code, and would probably make testing these objects easier, since many tests could be stolen from test_set.py.
My opinion is that the operators on MappingViews should be no more permissive than their primitive counterparts. A KeysView is in various ways more restrictive than a set, so having it be also occasionally less restrictive is surprising and in my opinion bad. This causes the loss of an efficient way to union a dict's keys with a list (among other methods). I'd then add .union, .intersection, etc. to remedy this.
This solution would bring the existing objects more in line with their primitive counterparts, while still allowing efficient actions on large dictionaries.
In short:
- Is #3 intended behavior?
- Should it (and the others be)?
- As a related aside, should .union and other frozen ops be added to the Set interface?
- If so, should the fix solely be a bugfix, should I do what I proposed, or something else entirely?
- More generally, should there be a guiding principle when it comes to MappingViews and similar special case objects?
[1]: There's some good conversation in this prior thread on this issue
https://mail.python.org/pipermail/python-ideas/2015-December/037472.html. The consensus seemed to be that making ValuesViews comparable by value is technically infeasible (O(n^2) worst case), while making it comparable based on the underlying dictionary is a possibility. This would be for OrderedDict, although many of the same arguments apply for a normal dictionary.
Thanks, I'm looking forward to the feedback,
Josh