Dictionary views are not entirely 'set like'
Let's start with a quick quiz: what is the result of each of the following (on Python3.5.x)? {}.keys() | set() # 1 set() | [] # 2 {}.keys() | [] # 3 set().union(set()) # 4 set().union([]) # 5 {}.keys().union(set()) # 6 If your answer was set([]), TypeError, set([]), set([]), set([]), AttributeError, then you were correct. That, to me, is incredibly unintuitive. Next up: {}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10 True, True, False, False. Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up for debate.[1] First thing first, the behavior exhibited by #3 is a bug (or at least it probably should be, and one I'd be happy to fix. However, before doing that I felt it would be good to propose some questions and suggestions for how that might be done. There are, as far as I can tell, two reasons to use a MappingView, memory efficiency or auto-updating (a view will continue to mirror changes in the underlying object). I'll focus on the first because the second can conceivably be solved in other ways. Currently, if I want to union a dictionaries keys with a non-set iterable, I can do `{}.keys() | []`. This is a bug[2], the set or operator should only work on another set. That said, fixing this bug removes the ability to efficiently or keys with another iterable, `set({}.keys()).update([])` for efficiency, or `set({}.keys()).union([])` for clarity. Fixing this is simply a matter of adding a `.union` method to the KeysView (and possibly to the Set abc as well). Although that may not be something that is wanted. The issue, as far as I can tell, is whether we want people converting from MappingViews to "primitives" as soon as possible, or if we want to encourage people people to use the views for as long as possible. There are arguments on both sides: encouraging people to use the views causes these objects to become more complex, introducing more surface area for bugs, more code to be maintained, etc. Further, there's there is currently one obvious way to do things, convert to a primitive, if you're doing any complex action. On the other hand, making MappingViews more like the primitives they represent has positives for performance, simplifies user code, and would probably make testing these objects easier, since many tests could be stolen from test_set.py. My opinion is that the operators on MappingViews should be no more permissive than their primitive counterparts. A KeysView is in various ways more restrictive than a set, so having it be also occasionally less restrictive is surprising and in my opinion bad. This causes the loss of an efficient way to union a dict's keys with a list (among other methods). I'd then add .union, .intersection, etc. to remedy this. This solution would bring the existing objects more in line with their primitive counterparts, while still allowing efficient actions on large dictionaries. In short: - Is #3 intended behavior? - Should it (and the others be)? - As a related aside, should .union and other frozen ops be added to the Set interface? - If so, should the fix solely be a bugfix, should I do what I proposed, or something else entirely? - More generally, should there be a guiding principle when it comes to MappingViews and similar special case objects? [1]: There's some good conversation in this prior thread on this issue https://mail.python.org/pipermail/python-ideas/2015-December/037472.html. The consensus seemed to be that making ValuesViews comparable by value is technically infeasible (O(n^2) worst case), while making it comparable based on the underlying dictionary is a possibility. This would be for OrderedDict, although many of the same arguments apply for a normal dictionary. [2]: Well, it probably should be a bug, its explicitly tested for ( https://github.com/python/cpython/blob/master/Lib/test/test_dictviews.py#L10...), whereas sets are explicitly tested for the opposite functionality ( https://github.com/python/cpython/blob/master/Lib/test/test_set.py#L92) Thanks, I'm looking forward to the feedback, Josh
On Wed, Apr 6, 2016, at 05:38, Joshua Morton wrote:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up for debate.[1]
Last time this came up, the conclusion was that making values views comparable was intractable due to the fact that they're unordered but the values themselves aren't hashable. Then the discussion got sidetracked into a discussion of whether the justification for not having them be hashable (Java does just fine with everything being hashable and content-based hashes for mutable objects) makes sense in a "consenting-adults" world.
On Wed, Apr 6, 2016 at 6:49 AM, Random832 <random832@fastmail.com> wrote:
Last time this came up, the conclusion was that making values views comparable was intractable due to the fact that they're unordered but the values themselves aren't hashable. Then the discussion got sidetracked into a discussion of whether the justification for not having them be hashable (Java does just fine with everything being hashable and content-based hashes for mutable objects) makes sense in a "consenting-adults" world.
At risk of repeating the derailment: while that's true, I'm also under the impression that interest in deeply immutable objects is growing in the Java community.
Indeed, I noted that (in a not-obvious endnote). Perhaps I shouldn't have mentioned the issue at all as its relatively secondary, but on the other hand while my main points have to do specifically with KeysView, it would be worth bundling all of the inconsistencies in all MappingViews at once, if only for pragmatic reasons. In the prior discussion, guido also seemed open to making values equivalent on dictionary identity equality (#10), which I think makes more sense than current behavior and doesn't suffer performance issues. In any case, I would consider that a secondary concern. On Wed, Apr 6, 2016 at 8:50 AM Random832 <random832@fastmail.com> wrote:
On Wed, Apr 6, 2016, at 05:38, Joshua Morton wrote:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up for debate.[1]
Last time this came up, the conclusion was that making values views comparable was intractable due to the fact that they're unordered but the values themselves aren't hashable. Then the discussion got sidetracked into a discussion of whether the justification for not having them be hashable (Java does just fine with everything being hashable and content-based hashes for mutable objects) makes sense in a "consenting-adults" world. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Apr 6, 2016, at 13:25, Joshua Morton wrote:
In the prior discussion, guido also seemed open to making values equivalent on dictionary identity equality (#10), which I think makes more sense than current behavior and doesn't suffer performance issues. In any case, I would consider that a secondary concern.
You could probably handle a lot of common cases by: - Eliminating any items which are present in both collections by identity. - Attempting to sort the lists of remaining items. But, yeah, the way to do it in O(N) requires an "ephemeral hash" operation which Python doesn't have and can't grow _now_, no matter what the justification for not always having had it. That he won't change it even after he builds the time machine doesn't really mean anything when it comes down to it.
On Apr 6, 2016 7:50 AM, "Random832" <random832@fastmail.com> wrote:
On Wed, Apr 6, 2016, at 05:38, Joshua Morton wrote:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is
up
for debate.[1]
Last time this came up, the conclusion was that making values views comparable was intractable due to the fact that they're unordered but the values themselves aren't hashable. Then the discussion got sidetracked into a discussion of whether the justification for not having them be hashable (Java does just fine with everything being hashable and content-based hashes for mutable objects) makes sense in a "consenting-adults" world.
here's a related discussion: [Python-ideas] Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected? https://mail.python.org/pipermail/python-ideas/2015-December/037472.html
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Apr 6, 2016 7:50 AM, "Random832" <random832@fastmail.com> wrote:
On Wed, Apr 6, 2016, at 05:38, Joshua Morton wrote:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is
up
for debate.[1]
Last time this came up, the conclusion was that making values views comparable was intractable due to the fact that they're unordered but the values themselves aren't hashable. Then the discussion got sidetracked into a discussion of whether the justification for not having them be hashable (Java does just fine with everything being hashable and content-based hashes for mutable objects) makes sense in a "consenting-adults" world.
With e.g. OrderedDict, one practical solution is to subclass and wrap .keys() in a set and .values() in a list.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Something has changed then, because in Python 2.7.10 I get ##1,3 TypeError ##9,10 True I can't see anything wrong with the actual behaviour in any of the examples. Rob Cliffe On 06/04/2016 10:38, Joshua Morton wrote:
Let's start with a quick quiz: what is the result of each of the following (on Python3.5.x)?
{}.keys() | set() # 1 set() | [] # 2 {}.keys() | [] # 3 set().union(set()) # 4 set().union([]) # 5 {}.keys().union(set()) # 6
If your answer was set([]), TypeError, set([]), set([]), set([]), AttributeError, then you were correct. That, to me, is incredibly unintuitive.
Next up:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up for debate.[1]
On Wed, Apr 6, 2016 at 11:03 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Something has changed then, because in Python 2.7.10 I get ##1,3 TypeError ##9,10 True I can't see anything wrong with the actual behaviour in any of the examples.
Python 2.7 has very different handling of .keys() and .values() on dictionaries - they return lists instead of views. You can ignore 2.7 for the purposes of this discussion - it's not "set-like" in the way that's being considered here. ChrisA
On Apr 6, 2016 4:39 AM, "Joshua Morton" <joshua.morton13@gmail.com> wrote:
Let's start with a quick quiz: what is the result of each of the
following (on Python3.5.x)?
{}.keys() | set() # 1 set() | [] # 2 {}.keys() | [] # 3 set().union(set()) # 4 set().union([]) # 5 {}.keys().union(set()) # 6
If your answer was set([]), TypeError, set([]), set([]), set([]),
AttributeError, then you were correct. That, to me, is incredibly unintuitive.
Next up:
{}.keys() == {}.keys() # 7 {}.items() == {}.items() # 8 {}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
True, True, False, False.
Numbers 1, 2, 4, 5 are expected behavior. 3 and 6 are not, and 7-10 is up
for debate.[1]
First thing first, the behavior exhibited by #3 is a bug (or at least it
probably should be, and one I'd be happy to fix. However, before doing that I felt it would be good to propose some questions and suggestions for how that might be done.
There are, as far as I can tell, two reasons to use a MappingView, memory
I'll focus on the first because the second can conceivably be solved in other ways.
Currently, if I want to union a dictionaries keys with a non-set iterable, I can do `{}.keys() | []`. This is a bug[2], the set or operator should only work on another set. That said, fixing this bug removes the ability to efficiently or keys with another iterable, `set({}.keys()).update([])` for efficiency, or `set({}.keys()).union([])` for clarity.
Fixing this is simply a matter of adding a `.union` method to the KeysView (and possibly to the Set abc as well). Although that may not be something that is wanted. The issue, as far as I can tell, is whether we want people converting from MappingViews to "primitives" as soon as
efficiency or auto-updating (a view will continue to mirror changes in the underlying object). a third reason: Mapping and MutableMapping do not assume that all of the data is buffered into RAM. * e.g. on top of a DB * https://docs.python.org/3/library/collections.abc.html#collections.abc.Mutab... possible, or if we want to encourage people people to use the views for as long as possible.
There are arguments on both sides: encouraging people to use the views
causes these objects to become more complex, introducing more surface area for bugs, more code to be maintained, etc. Further, there's there is currently one obvious way to do things, convert to a primitive, if you're doing any complex action. On the other hand, making MappingViews more like the primitives they represent has positives for performance, simplifies user code, and would probably make testing these objects easier, since many tests could be stolen from test_set.py.
My opinion is that the operators on MappingViews should be no more
permissive than their primitive counterparts. A KeysView is in various ways more restrictive than a set, so having it be also occasionally less restrictive is surprising and in my opinion bad. This causes the loss of an efficient way to union a dict's keys with a list (among other methods). I'd then add .union, .intersection, etc. to remedy this.
This solution would bring the existing objects more in line with their
primitive counterparts, while still allowing efficient actions on large dictionaries.
In short:
- Is #3 intended behavior? - Should it (and the others be)? - As a related aside, should .union and other frozen ops be added to
- If so, should the fix solely be a bugfix, should I do what I proposed, or something else entirely? - More generally, should there be a guiding principle when it comes to MappingViews and similar special case objects?
[1]: There's some good conversation in this prior thread on this issue https://mail.python.org/pipermail/python-ideas/2015-December/037472.html. The consensus seemed to be that making ValuesViews comparable by value is technically infeasible (O(n^2) worst case), while making it comparable
the Set interface? based on the underlying dictionary is a possibility. This would be for OrderedDict, although many of the same arguments apply for a normal dictionary.
[2]: Well, it probably should be a bug, its explicitly tested for (
https://github.com/python/cpython/blob/master/Lib/test/test_dictviews.py#L10...), whereas sets are explicitly tested for the opposite functionality ( https://github.com/python/cpython/blob/master/Lib/test/test_set.py#L92)
Thanks, I'm looking forward to the feedback, Josh
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Apr 6, 2016, at 10:38 AM, Joshua Morton <joshua.morton13@gmail.com> wrote: set() | [] # 2 {}.keys() | [] # 3
Looks like this should be standardized. Either both raise TypeError, or both return a set. My preference would be TypeError, but that might be worse for backwards-compatibility.
{}.keys().union(set()) # 6
Seems to me that the pipe operator is staying on MappingView, so it's reasonable to add a corresponding ``.union`` to mimic sets. And intersection, etc.
{}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
It's weird, but float('nan') != float('nan'). I'm not particularly bothered by this.
These were exactly my thoughts. I wanted to bump this, since it was drowned out by more important things like Paths and invertible booleans and almost no discussion was had on the main issue, that of dict views not acting like sets. Since things have died down, is that behavior that should be remedied, and if so should it be backwards compatible (make set more permissive), treat it as a bugfix (make the views raise errors)? And should it include union/other methods to keep performance on the usecase that now throws a bug? -Josh On Thu, Apr 7, 2016 at 1:53 PM Michael Selik <mike@selik.org> wrote:
On Apr 6, 2016, at 10:38 AM, Joshua Morton <joshua.morton13@gmail.com> wrote: set() | [] # 2 {}.keys() | [] # 3
Looks like this should be standardized. Either both raise TypeError, or both return a set. My preference would be TypeError, but that might be worse for backwards-compatibility.
{}.keys().union(set()) # 6
Seems to me that the pipe operator is staying on MappingView, so it's reasonable to add a corresponding ``.union`` to mimic sets. And intersection, etc.
{}.values() == {}.values() # 9 d = {}; d.values() == d.values() # 10
It's weird, but float('nan') != float('nan'). I'm not particularly bothered by this.
participants (7)
-
Chris Angelico
-
Ian Kelly
-
Joshua Morton
-
Michael Selik
-
Random832
-
Rob Cliffe
-
Wes Turner