Comparing dict.values()
Hi! During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts. Currently the following works as expected: ``` d = {'a': 1234} d.keys() == d.keys() d.items() == d.items() ``` but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off. In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do. I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just `False` seems a bit misleading. What are your thoughts on the issue? Best regards, Kristian Klette [0]: https://bugs.python.org/issue37585 [1]: https://github.com/python/cpython/pull/14737
On 2019-07-23 21:59, Kristian Klette wrote:
Hi!
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just `False` seems a bit misleading.
What are your thoughts on the issue?
Best regards, Kristian Klette
[0]: https://bugs.python.org/issue37585 [1]: https://github.com/python/cpython/pull/14737
Well, the keys can function as a set, and, in fact, can equal a set:
{'a': 1234}.keys() == {'a'} True
so the keys of 2 dicts can be compared efficiently. The items of 2 dicts can also be compared efficiently because you still have the keys, so you can check the key efficiently and then check the value. However, when comparing the values you have a problem: you have 2 collections of objects that might contain duplicates, might not be hashable, and might not be sortable, so comparing them could be inefficient, and you can't refer back to their keys like in the case of comparing the items as above because the 2 dicts might have different keys. Unless someone can come up with an efficient solution, I'd probably go with raising an exception.
On Tue, 23 Jul 2019 23:44:35 +0100 MRAB <python@mrabarnett.plus.com> wrote:
However, when comparing the values you have a problem: you have 2 collections of objects that might contain duplicates, might not be hashable, and might not be sortable, so comparing them could be inefficient, and you can't refer back to their keys like in the case of comparing the items as above because the 2 dicts might have different keys. Unless someone can come up with an efficient solution, I'd probably go with raising an exception.
Equality comparisons should never raise. Regards Antoine.
I find myself in agreement with Inada (https://bugs.python.org/issue12445), in that comparing the values view between two dictionaries by itself would not be particularly useful for enough people to warrant implementing the comparison. In most situations when using the data structure, it is only useful to either compare the keys and values with ``d0.items() == d1.items()`` or just the keys with ``d0.keys() == d1.keys()``. The values are generally not particularly useful without the corresponding keys, so I'm actually somewhat curious as to the motivation of creating the function ``dict.values()``. But, if for any reason someone actually had to compare only the values (I can't imagine the reason), they could compare them by converting them to a list: ``list(d0.values()) == list(d1.values())``. It adds an extra step, but I don't think enough people would make use of something like this to justify adding the direct comparison with ``d0.values() == d1.values())``. However, I agree that the current behavior of just returning ``False`` is quite misleading, regardless of whether or not implementing an accurate comparison between the values views would be worthwhile. I'm not sure as to what the most appropriate behavior would be, but since it's using ``__eq__``, [NotImplemented](https://docs.python.org/3/library/constants.html#NotImplemented) seems appropriate. Another alternative would be to return ``None``. A note in the docs for [NotImplementedError](https://docs.python.org/3/library/exceptions.html#NotImplementedError) states "It [NotImplementedError] should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to None".
On Tue, Jul 23, 2019 at 08:59:09PM -0000, Kristian Klette wrote:
Hi!
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
It seems to be doing an identity test on the "dict_values" view object itself: py> d = {'a': 1} py> a = b = d.values() py> a == b True which I expect is probably the default __eq__ inherited from object. Each time you call d.values() you get a distinct object, hence the False. I agree that this is a little surprising. Given that they are *views* of an underlying dict, I would expect that two views of the same dict ought to compare equal: assert d.values() == d.values() So at the least, we ought to have dict.values() comparison return True if the underlying dicts are identical. In pseudocode: def __eq__(self, other): if self is other: # Same object implies equality. return True if isinstance(other, Dict_Values_View_Type): if self.owner_dict is other.owner_dict: # Two views into the same dict are always equal. return True return NotImplemented I think that's the minimal behaviour that makes sense for a view. Beyond that, we start getting complicated, and potentially expensive. But I can suggest at least one useful invariant. If a, b are two dicts: a.items() == b.items() ought to be equivalent to: (a.keys() == b.keys()) and (a.values() == b.values) That implies something like this pseudo-code: def __eq__(self, other): if self is other: # Same object implies equality. return True if isinstance(other, Dict_Values_View_Type): a = self.owner_dict # dict we are taking a view of b = other.owner_dict if a is b: # Two views into the same dict are always equal. return True if len(a) != len(b): # Unequal lengths implies the values cannot be equal. return False if a.items() == b.items(): # (key,value) pairs are equal implies values are equal. return True elif a.keys() == b.keys(): # keys are equal but items are not equal implies # that the values must be different. return False # Fall back on value by value comparison? return list(self) == list(other) return NotImplemented
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception.
Equality tests should (almost?) never raise an exception. It should be safe to test "are these two objects equal?" without guarding it in a try...except block. The definition of "equal" may not always be obvious, but it shouldn't raise unless the __eq__ method is buggy. In Python, its quite common for __eq__ to fall back on ``is``, e.g.: py> a = lambda x: x+1 py> a == a True py> a == (lambda x: x+1) False but I think in the case of views, we should at least fall back on identity of the underlying dicts, even if we decide the more complex tests are not worth the trouble. -- Steven
On 7/23/2019 8:27 PM, Steven D'Aprano wrote:
On Tue, Jul 23, 2019 at 08:59:09PM -0000, Kristian Klette wrote:
Hi!
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
It seems to be doing an identity test on the "dict_values" view object itself:
py> d = {'a': 1} py> a = b = d.values() py> a == b True
which I expect is probably the default __eq__ inherited from object. Each time you call d.values() you get a distinct object, hence the False.
I agree that this is a little surprising.
If one has not learned the default meaning of '==' in Python. Perhaps this should be given more emphasis in beginner courses. "What does it mean for two object to be 'equal'?" It is not a trivial question.
Given that they are *views* of an underlying dict, I would expect that two views of the same dict ought to compare equal:
assert d.values() == d.values()
So at the least, we ought to have dict.values() comparison return True if the underlying dicts are identical. In pseudocode:
def __eq__(self, other): if self is other: # Same object implies equality. return True if isinstance(other, Dict_Values_View_Type): if self.owner_dict is other.owner_dict: # Two views into the same dict are always equal. return True return NotImplemented
I think that's the minimal behaviour that makes sense for a view.
Beyond that, we start getting complicated, and potentially expensive. But I can suggest at least one useful invariant. If a, b are two dicts:
a.items() == b.items()
ought to be equivalent to:
(a.keys() == b.keys()) and (a.values() == b.values)
That implies something like this pseudo-code:
def __eq__(self, other): if self is other: # Same object implies equality. return True if isinstance(other, Dict_Values_View_Type): a = self.owner_dict # dict we are taking a view of b = other.owner_dict if a is b: # Two views into the same dict are always equal. return True if len(a) != len(b): # Unequal lengths implies the values cannot be equal. return False if a.items() == b.items(): # (key,value) pairs are equal implies values are equal. return True elif a.keys() == b.keys(): # keys are equal but items are not equal implies # that the values must be different. return False
Makes sense, up to here.
# Fall back on value by value comparison? return list(self) == list(other)
This seems wrong. Creating two lists raises the cost and the comparison will depend on insertion order.
return NotImplemented
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception.
Equality tests should (almost?) never raise an exception. It should be safe to test "are these two objects equal?" without guarding it in a try...except block. The definition of "equal" may not always be obvious, but it shouldn't raise unless the __eq__ method is buggy.
I strongly agree, with almost? likely omitted. Testing before acting and recovering from failure after acting should be alternatives. -- Terry Jan Reedy
On Tue, Jul 23, 2019 at 10:02:34PM -0400, Terry Reedy wrote: [...]
If one has not learned the default meaning of '==' in Python. Perhaps this should be given more emphasis in beginner courses. "What does it mean for two object to be 'equal'?" It is not a trivial question.
No, it is not trivial, but the default meaning of equality in Python does not match the common idea that two things are equal if they are the same, equivalent, interchangeable, etc. By default, no two objects are ever equal, even if their values are equal. Equality only holds between an object and itself. (In fairness, it is hard to think of another, more useful, definition of equality suitable as the default.) In any case, the default semantics of equality inherited from object may explain the current behaviour, but that doesn't mean that the current behaviour is useful. When it is meaningful to say that "the value of these two distinct objects are equal", that is a strong hint that equality ought to be based on the value, rather than the identity, of the objects. [...]
Makes sense, up to here.
I'm glad to hear it, because I think that's a strong invariate: if the items are equal, so should the values be equal. Beyond that, I'm not certain what is the right behaviour.
# Fall back on value by value comparison? return list(self) == list(other)
This seems wrong. Creating two lists raises the cost and the comparison will depend on insertion order.
You're probably right. I did say that this was potentially expensive. We should be able to avoid the needless creation of two lists: return all(x==y for x, y in zip(self, other)) but that still leaves the insertion order problem. How does this seem to you? Two dict.values objects are equal if: - they are in fact the same object (identity test on the views); - they are both views of the same dict (identity test on the dicts); - they are views of distinct, but equal, dicts; - or there is a 1:1 correspondence between values (possibly not unique) in the two views. The first three tests should be straight-forward. The last is likely be slow, but something like this ought to work: a = list(self) b = list(other) return len(a) == len(b) and all(a.count(x) == b.count(x) for x in a) Given that we cannot rely on the values being hashable or even sortable, I don't know how to make it more efficient in the general case. -- Steven
On 24/07/2019 10:31:46, Steven D'Aprano wrote:
How does this seem to you? Two dict.values objects are equal if:
- they are in fact the same object (identity test on the views);
- they are both views of the same dict (identity test on the dicts);
- they are views of distinct, but equal, dicts;
Naive question: Is there a way (in Python) to get at the underlying dict from a dict.values object, or more generally from any dict view object?
dir({}.values()) ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', ' __init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', ' __repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']
Rob Cliffe
24.07.19 13:33, Rob Cliffe via Python-Dev пише:
Naive question: Is there a way (in Python) to get at the underlying dict from a dict.values object, or more generally from any dict view object?
No, there is not. As well as there is no way to get at the underlying list, tuple, dict from corresponding iterators.
Op 24 jul. 2019 om 02:27 heeft Steven D'Aprano <steve@pearwood.info> het volgende geschreven:
But I can suggest at least one useful invariant. If a, b are two dicts:
a.items() == b.items()
ought to be equivalent to:
(a.keys() == b.keys()) and (a.values() == b.values)
I don’t think this invariant holds unless comparison is order dependent. {1:2, 3:4} and {1:4, 3:2} have the same keys and values, but not the same items. Ronald
On Wed, Jul 24, 2019 at 12:36:29PM +0200, Ronald Oussoren wrote:
Op 24 jul. 2019 om 02:27 heeft Steven D'Aprano <steve@pearwood.info> het volgende geschreven:
But I can suggest at least one useful invariant. If a, b are two dicts:
a.items() == b.items()
ought to be equivalent to:
(a.keys() == b.keys()) and (a.values() == b.values)
I don’t think this invariant holds unless comparison is order dependent. {1:2, 3:4} and {1:4, 3:2} have the same keys and values, but not the same items.
You are right, they aren't equivalent, that was the wrong term to use. But not because of dependency on order. They aren't equivalent because you can have two dicts where the keys are equal and the values are equal, but the items are not: a = {1: 'a', 2: 'b', 3: 'c'} b = {1: 'a', 2: 'c', 3: 'b'} Both dicts have keys {1, 2, 3} and hence equal keys; both have values {'a', 'b', 'c'} and hence equal values; but the items are not equal. So the invariant only goes one way, not both: (1) items equal implies values (and keys) are also equal; (2) but values and keys equal doesn't imply items are equal. But for our purposes, we don't care about case (2) and it doesn't matter that it doesn't hold. -- Steven
On 23/07/2019 21:59, Kristian Klette wrote:
Hi!
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
The request was to establish a consensus on a reasonable semantic. I don't think that can be adequately addressed by such a simple example and the criterion "works as expected". What is expected of:
x = dict(a=1, b=2) y = dict(b=2, a=1) x == y True
Two superficially reasonable semantics are to compare the list or the set of the values:
set(x.values()) == set(y.values()) True list(x.values()) == list(y.values()) False
Terry points out some implementation and definitional problems (unhashable values) with set demantics. Steven proposes (essentially) list semantics, but isn't it surprising that equal dictionaries should not have equal .values()? Jeff Allen
23.07.19 23:59, Kristian Klette пише:
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
Is it expected to you that `iter(d) == iter(d)` returns False? By default the equality operator returns True when and only when operands are identical. It is expected. Some objects (like numbers or strings) can override the default behavior and implement different rules for equality.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just `False` seems a bit misleading.
What the rule for equality do you propose? What is the use case for it? If you want to compare dict value views as ordered sequences, it can be surprised that `d1.values() != d2.values()` when `d1 == d2`. It will be inconsistent with orderless comparison of `keys()` and `items()`. If you want to compare them as unordered sequences, the computation complexity of the operation will be quadratic. Note also, that while in Python 2 always `d.values() == d.values()`, it is possible that `d1.keys() != d2.keys()` and `d1.values() != d2.values()` when `d1 == d2`. Python 3 is more consistent.
Serhiy Storchaka wrote:
23.07.19 23:59, Kristian Klette пише:
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from .values() of two dicts. Currently the following works as expected: d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items()
but d.values() == d.values() does not return the expected results. It always returns False. The symmetry is a bit off. Is it expected to you that iter(d) == iter(d) returns False? By default the equality operator returns True when and only when operands are identical. It is expected. Some objects (like numbers or strings) can override the default behavior and implement different rules for equality. I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just False seems a bit misleading. What the rule for equality do you propose? What is the use case for it? If you want to compare dict value views as ordered sequences, it can be surprised that d1.values() != d2.values() when d1 == d2. It will be inconsistent with orderless comparison of keys() and items(). If you want to compare them as unordered sequences, the computation complexity of the operation will be quadratic. Note also, that while in Python 2 always d.values() == d.values(), it is possible that d1.keys() != d2.keys() and d1.values() != d2.values() when d1 == d2. Python 3 is more consistent.
When I saw this I thought, "it should be like `set(d1.values()) == set(d2.values())`", but has been pointed out there's no guarantee that all values will be hashable. After that I have no expectation since order isn't guaranteed. I think this is one of those cases where it's superficially surprising when you don't think about all the ramifications, but once you understand the complexity of the problem then it becomes more clear that it isn't straight-forward. To me a doc update for dict.values() stating that the iterator can't be compared and a brief mention as to why would be the best solution for this.
On 7/24/2019 1:30 PM, Brett Cannon wrote:
Serhiy Storchaka wrote:
o you propose? What is the use case for it? If you want to compare dict value views as ordered sequences, it can be surprised that d1.values() != d2.values() when d1 == d2. It will be inconsistent with orderless comparison of keys() and items(). If you want to compare them as unordered sequences, the computation complexity of the operation will be quadratic. Note also, that while in Python 2 always d.values() == d.values(), it is possible that d1.keys() != d2.keys() and d1.values() != d2.values() when d1 == d2. Python 3 is more consistent.
When I saw this I thought, "it should be like `set(d1.values()) == set(d2.values())`", but has been pointed out there's no guarantee that all values will be hashable. After that I have no expectation since order isn't guaranteed.
I think this is one of those cases where it's superficially surprising when you don't think about all the ramifications, but once you understand the complexity of the problem then it becomes more clear that it isn't straight-forward.
To me a doc update for dict.values() stating that the iterator can't be compared and a brief mention as to why would be the best solution for this.
(I hope my quoting is correct.) I agree with Brett: let's just document this and not make any code changes. If someone really has a use case, which I haven't seen, then they can write their own comparison using constraints specific to their data: perhaps their values are hashable, for example. Eric
On Wed, Jul 24, 2019 at 05:30:19PM -0000, Brett Cannon wrote:
When I saw this I thought, "it should be like `set(d1.values()) == set(d2.values())`", but has been pointed out there's no guarantee that all values will be hashable.
The hashability requirement for sets is, in a sense, an implementation detail. It might be a requirement for sets in Python the language, but its not a requirement for abstract "sets of values". E.g. Java includes a standard TreeSet which doesn't require hashability https://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html In this case, they need to be multisets, since {'a': 1, 'b': 2, 'c': 1}.values() != {'a': 1, 'b': 2, 'c': 2}.values()
After that I have no expectation since order isn't guaranteed.
I think this is one of those cases where it's superficially surprising when you don't think about all the ramifications, but once you understand the complexity of the problem then it becomes more clear that it isn't straight-forward.
Nobody said it was straight-forward, particularly if we want guaranteed efficient comparisons in both time and space at the same time. Brett, I feel that you are dismissing this thread as "not thinking through the ramifications" without reading it through, because I'm pretty sure that we have thought through the ramifications in a lot more detail than your dismissal justifies. Let's start with the minimal change we have suggested: that two views should be considered equal if they both belong to the same dict. assert d.values() == d.values() Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*? (By correct I mean in the technical sense that if we were writing a functional spec for views, we would actively desire two views of the same dict to be unequal.)
To me a doc update for dict.values() stating that the iterator can't be compared and a brief mention as to why would be the best solution for this.
We're not talking about comparing iterators. We're talking about comparing views into a dict. That's a difference that makes all the difference. -- Steven
Steven D'Aprano wrote:
Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*?
What I'm getting from this thread is that there are a variety of possible behaviours for dict values comparison, any of which could be considered "correct" depending on what the programmer is trying to do. I know there are good reasons for the guideline that equality comparisons should never raise exceptions, but this seems like a situation where Python really should slap you on the ear and make you specify exactly what you want. -- Greg
I agree with Greg. There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in. This really feels like a good cade for reading a descriptive exception. If someone wants too compare `set(d.values())` that's great. If they want `list(d.values())`, also a sensible question. But the programmer should spell it explicitly. This feels similar to NumPy arrays, that also will not compare for equality in bare form. But they offer .any(), and .all() and other means of expressing the comparison you actually want in a situation. On Wed, Jul 24, 2019, 6:32 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Steven D'Aprano wrote:
Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*?
What I'm getting from this thread is that there are a variety of possible behaviours for dict values comparison, any of which could be considered "correct" depending on what the programmer is trying to do.
I know there are good reasons for the guideline that equality comparisons should never raise exceptions, but this seems like a situation where Python really should slap you on the ear and make you specify exactly what you want.
-- Greg _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WR2JGENQ...
On 25/07/2019 00:09:37, David Mertz wrote:
I agree with Greg.
There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in. +1
This really feels like a good cade for reading a descriptive exception. If someone wants too compare `set(d.values())` that's great. If they want `list(d.values())`, also a sensible question. But the programmer should spell it explicitly.
So, a helpful error message including something like "Cannot compare dict.values directly, consider converting to sets / lists / sorted lists before comparing" ?
Exactly! that was my thought that the exception message could hint at likely approaches. The NumPy example seems to have a good pattern: arr1 == arr2 ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). On Wed, Jul 24, 2019, 8:06 PM Rob Cliffe via Python-Dev < python-dev@python.org> wrote:
On 25/07/2019 00:09:37, David Mertz wrote:
I agree with Greg.
There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in. +1
This really feels like a good cade for reading a descriptive exception. If someone wants too compare `set(d.values())` that's great. If they want `list(d.values())`, also a sensible question. But the programmer should spell it explicitly.
So, a helpful error message including something like "Cannot compare dict.values directly, consider converting to sets / lists / sorted lists before comparing" ? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CSTSLCDE...
On 7/25/19 2:25 AM, David Mertz wrote:
Exactly! that was my thought that the exception message could hint at likely approaches. The NumPy example seems to have a good pattern:
arr1 == arr2
|ValueError:Thetruth value of an array withmore than one element |isambiguous.
|Usea.any()ora.all().|
It's not the equality operator that errors: `==` means element-wise comparison in Numpy. The error would come from a conversion of the array to bool:
numpy.array([1, 2, 3]) == numpy.array([1, 3, 4]) array([ True, False, False])
if numpy.array([ True, False, False]): ... print('Same!') ... Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Numpy currently returns False when `==` “doesn't make sense”, but apparently has plans to change that:
numpy.array([1, 2, 3]) == numpy.array([1, 2]) __main__:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future. False
numpy.array([1, 2, 3]) == numpy.array(['a', 'b']) __main__:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison False
numpy.__version__ '1.16.4'
On Wed, Jul 24, 2019, 8:06 PM Rob Cliffe via Python-Dev <python-dev@python.org <mailto:python-dev@python.org>> wrote:
On 25/07/2019 00:09:37, David Mertz wrote: > I agree with Greg. > > There are various possible behaviors that might make sense, but having > `d.values() != d.values()` is about the only one I can see no sense in. +1 > > This really feels like a good cade for reading a descriptive > exception. If someone wants too compare `set(d.values())` that's > great. If they want `list(d.values())`, also a sensible question. But > the programmer should spell it explicitly. > > So, a helpful error message including something like "Cannot compare dict.values directly, consider converting to sets / lists / sorted lists before comparing" ? _______________________________________________ Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org> https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CSTSLCDE...
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/2IMLY36A...
On Wed, Jul 24, 2019 at 08:25:31PM -0400, David Mertz wrote:
Exactly! that was my thought that the exception message could hint at likely approaches. The NumPy example seems to have a good pattern:
arr1 == arr2
ValueError: The truth value of an array with more than one element is ambiguous.
That's not actually what numpy does: py> numpy.array([1, 2]) == numpy.array([1, 2]) array([ True, True], dtype=bool) In any case, we should not allow numpy's (mis)feature into builtins. It might (perhaps...) be okay for third-party objects to break the law of excluded middle, and implement de-facto multi-valued logic (where an exception == Maybe), but we shouldn't have builtins do that. -- Steven
I considered an alternative: return True if the underlying dicts were identical or equal, and raise an Exception otherwise. But I soon decided that this was a terrible idea: it could hide a bug by making faulty code work intermittently. Apologies for doubtless belabouring the blindingly obvious (but then again, if I don't mention this possibility, maybe someone even more idiotic than me will suggest it ). On 25/07/2019 00:49:56, Rob Cliffe via Python-Dev wrote:
On 25/07/2019 00:09:37, David Mertz wrote:
I agree with Greg.
There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in. +1
This really feels like a good cade for reading a descriptive exception. If someone wants too compare `set(d.values())` that's great. If they want `list(d.values())`, also a sensible question. But the programmer should spell it explicitly.
So, a helpful error message including something like "Cannot compare dict.values directly, consider converting to sets / lists / sorted lists before comparing" ? _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/CSTSLCDE...
--- This email has been checked for viruses by AVG. https://www.avg.com
On Thu, Jul 25, 2019 at 4:01 AM Rob Cliffe via Python-Dev < python-dev@python.org> wrote:
I considered an alternative: return True if the underlying dicts were identical or equal, and raise an Exception otherwise. But I soon decided that this was a terrible idea: it could hide a bug by making faulty code work intermittently. Apologies for doubtless belabouring the blindingly obvious (but then again, if I don't mention this possibility, maybe someone even more idiotic than me will suggest it ).
Whatever made you think I'd do that?
On Wed, 24 Jul 2019 19:09:37 -0400 David Mertz <mertz@gnosis.cx> wrote:
There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in.
Why? Does the following make no sense to you?
iter(()) == iter(()) False
Python deliberately allows you to compare everything with everything, at least for equality. Perhaps it shouldn't, but it's too late to design the language differently.
This feels similar to NumPy arrays, that also will not compare for equality in bare form.
They will, but then they return an array of booleans.
a = np.array([1,2]) b = np.array([1,3]) a == b array([ True, False])
Regards Antoine.
On Thu, Jul 25, 2019 at 05:53:47PM +0200, Antoine Pitrou wrote:
On Wed, 24 Jul 2019 19:09:37 -0400 David Mertz <mertz@gnosis.cx> wrote:
There are various possible behaviors that might make sense, but having `d.values() != d.values()` is about the only one I can see no sense in.
Why? Does the following make no sense to you?
iter(()) == iter(()) False
Views are not iterators, and the analogy is a poor one. In their own way, iterators are almost as weird as NANs. (Not *quite* as weird, since NANs break reflexivity too: x != x when x is a NAN.) But having two iterators which clearly yield the same values in the same order compare as unequal is weird. ("Same values in same order" includes the pair of exhausted iterator case.) The behaviour of iterators can be justified, and I'm not going to argue that it should be changed. For starters iterators are not *containers*, they are conceptually more of a process (yielding values one at a time). But we shouldn't emulate iterator behaviour in objects which aren't like iterators. Views are collections. They are sized, containers (support ``in``), and iterable: py> v = {'a': 1, 'b': 2}.values() py> len(v) 2 py> 2 in v True py> list(v) [1, 2] and unlike iterators, iterating over a view doesn't exhaust it. Conceptually, equality of two values view objects should be easy (if we don't care about efficiency of implementation). Two views are equal if they have the same length, and each value occurs the same number of times. Value views don't currently support the .count() method, but if they did, we could say two value views a, b were equal if: len(a) == len(b) and all(a.count(x) == b.count(x) for x in a) The .count method could be implemented like this: def count(self, x): n = 0 for a in self: if a == x: n += 1 return n So there are no conceptual problems in defining equality for value views. Putting aside efficiency, this is easy to solve. -- Steven
On Fri, 26 Jul 2019 20:28:05 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
So there are no conceptual problems in defining equality for value views. Putting aside efficiency, this is easy to solve.
Right. It's just waiting for someone's PR. However, that doesn't mean that the current behaviour is senseless. It's just less desirable than one might like. Regards Antoine.
On Sun, Jul 28, 2019 at 10:18:56PM +0200, Antoine Pitrou wrote:
On Fri, 26 Jul 2019 20:28:05 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
So there are no conceptual problems in defining equality for value views. Putting aside efficiency, this is easy to solve.
Right. It's just waiting for someone's PR. However, that doesn't mean that the current behaviour is senseless. It's just less desirable than one might like.
Acknowledged: the current behaviour seems to be inherited from the default behaviour of equality used by object, which is (I think) the only sensible default. -- Steven
David Mertz wrote:
This feels similar to NumPy arrays, that also will not compare for equality in bare form.
Not quite the same -- comparing numpy arrays doesn't raise an exception, it returns an array of booleans. What raises an exception is trying to use the resulting array in a boolean context. But it is an example of something raising an exception that one would normally expect to always succeed. In the case of dict.values() == dict.values(), raising an exception is probably the least bad thing. Yes, it can lead to code blowing up unexpectedly, but I think it's better than having code appear to work while doing something subtly different from what you wanted. -- Greg
25.07.19 01:15, Greg Ewing пише:
Steven D'Aprano wrote:
Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*?
What I'm getting from this thread is that there are a variety of possible behaviours for dict values comparison, any of which could be considered "correct" depending on what the programmer is trying to do.
I know there are good reasons for the guideline that equality comparisons should never raise exceptions, but this seems like a situation where Python really should slap you on the ear and make you specify exactly what you want.
Is there any precedence of raising an exception in the equality comparison? Does 3 == "3" returning False make more sense to you?
Serhiy Storchaka wrote:
Is there any precedence of raising an exception in the equality comparison? Does 3 == "3" returning False make more sense to you?
Personally, I don't find ``3 == "3"`` to be an equivalent comparison to ``d0.values() == d1.values()``. Generally, it makes sense when comparing two items of different types, they are not going to be equivalent (except in cases such as ``3 == 3.0``, but in that case they are both subtypes of numeric). I don't know that an exception would be the best behavior to suit this situation (or for anything using ``__eq__`` for that matter), but returning ``False`` seems to be a bit misleading. Instead, I think that either returning the ``NotImplemented`` constant or ``None`` would provide far more useful information to the user, without the hindrance of causing an exception. I'm leaning more favorably towards ``NotImplemented`` because it explicitly tells the user "Hey, that equality comparison isn't implemented".
On 7/25/2019 4:27 AM, Kyle Stanley wrote:
Serhiy Storchaka wrote:
Is there any precedence of raising an exception in the equality comparison? Does 3 == "3" returning False make more sense to you?
Personally, I don't find ``3 == "3"`` to be an equivalent comparison to ``d0.values() == d1.values()``. Generally, it makes sense when comparing two items of different types, they are not going to be equivalent (except in cases such as ``3 == 3.0``, but in that case they are both subtypes of numeric).
I don't know that an exception would be the best behavior to suit this situation (or for anything using ``__eq__`` for that matter), but returning ``False`` seems to be a bit misleading. Instead, I think that either returning the ``NotImplemented`` constant or ``None`` would provide far more useful information to the user, without the hindrance of causing an exception. I'm leaning more favorably towards ``NotImplemented`` because it explicitly tells the user "Hey, that equality comparison isn't implemented".
That makes things worse. Now the comparison is always true in a boolean context. And presumably you'd want __ne__ to also return NotImplemented, so then both __eq__ and __ne__ would be true, since bool(NotImplemented) is True. Eric
7/25/2019 6:00 AM, Eric V. Smith wrote:
On 7/25/2019 4:27 AM, Kyle Stanley wrote:
Serhiy Storchaka wrote:
Is there any precedence of raising an exception in the equality comparison? Does 3 == "3" returning False make more sense to you?
Personally, I don't find ``3 == "3"`` to be an equivalent comparison to ``d0.values() == d1.values()``. Generally, it makes sense when comparing two items of different types, they are not going to be equivalent (except in cases such as ``3 == 3.0``, but in that case they are both subtypes of numeric).
I don't know that an exception would be the best behavior to suit this situation (or for anything using ``__eq__`` for that matter), but returning ``False`` seems to be a bit misleading. Instead, I think that either returning the ``NotImplemented`` constant or ``None`` would provide far more useful information to the user, without the hindrance of causing an exception. I'm leaning more favorably towards ``NotImplemented`` because it explicitly tells the user "Hey, that equality comparison isn't implemented".
That makes things worse. Now the comparison is always true in a boolean context. And presumably you'd want __ne__ to also return NotImplemented, so then both __eq__ and __ne__ would be true, since bool(NotImplemented) is True.
I might have to take that back. I hadn't factored in what the == and != machinery does, beyond calling __eq__ or __ne__. Eric
Eric V. Smith wrote:
That makes things worse. Now the comparison is always true in a boolean context. And presumably you'd want __ne__ to also return >NotImplemented, so then both __eq__ and __ne__ would be true, since >bool(NotImplemented) is True.
Eric V Smith wrote: 7/25/2019 6:00 AM, Eric V. Smith wrote:
I might have to take that back. I hadn't factored in what the == and != machinery does, beyond calling __eq__ or __ne__.
Based on the behavior in this example class, it looks like this would still function appropriately, despite the value of bool(NotImplemented):
class A: def __eq__(self, other): return NotImplemented a = A() b = A() a == b False a != b True
As you said, I believe it has to do with the underlying behavior of __eq__ and __ne__. However, this does somewhat lead to the same surface level issue of a False ultimately being returned. The NotImplemented makes the intention a bit more obvious if someone were to look at the __eq__ method for dict.values(), but otherwise it might still be the same issue. I'm somewhat curious as to why `a == b` doesn't directly return NotImplemented instead of False. Perhaps the underlying behavior makes it a pain to return anything other than True, False, or None (that's purely speculation on my part, I haven't looked further into it).
On Thu, Jul 25, 2019 at 10:15:15AM +1200, Greg Ewing wrote:
What I'm getting from this thread is that there are a variety of possible behaviours for dict values comparison, any of which could be considered "correct" depending on what the programmer is trying to do.
Can you elaborate on these varieties of behaviour? Aside from "change nothing" and "raise an exception". Speaking for myself, its taken a few iterations to nail down *precisely* how equality ought to work in detail. But the basic semantics hasn't really changed: two (multi)sets of values are equal if they have the same individual values, regardless of order. values {1, 2, 2, [], "abc", 3} and {2, 1, "abc", 3, [], 2} are equal since each have the same elements and counts: 1 occurs once in both; 2 occurs twice in both; 3 occurs once in both; [] occurs once in both; "abc" occurs once in both. So there's a 1:1 correspondence of elements in one values view to elements in the other. (Ignore the fact that lists are unhashable so cannot be inserted into efficient, hash-based Python sets. I'm talking abstract multisets.) I'll admit it took me a few attempts to get the details right (assuming they are right now...), one of my earliest attempts included a fall back to compare lists which was a bug. If there is any other behaviour[1] that makes sense, I haven't seen anyone suggest it. [1] Again, setting aside the current behaviour inherited from object, and raising an exception. -- Steven
Steven D'Aprano wrote:
But the basic semantics hasn't really changed: two (multi)sets of values are equal if they have the same individual values, regardless of order.
Why regardless of order? Dicts have an ordering nowadays. Why shouldn't that ordering be reflected in the algorithm for comparing their values()? -- Greg
On 7/26/2019 8:24 AM, Greg Ewing wrote:
Steven D'Aprano wrote:
But the basic semantics hasn't really changed: two (multi)sets of values are equal if they have the same individual values, regardless of order.
Why regardless of order? Dicts have an ordering nowadays. Why shouldn't that ordering be reflected in the algorithm for comparing their values()?
Because it's already the case that order doesn't matter when comparing dicts and their keys (and presumably items, but I didn't check):
{1:2,2:3} == {2:3,1:2} True list({1:2,2:3}.keys()) [1, 2] list({2:3,1:2}.keys()) [2, 1] {2:3,1:2}.keys() == {1:2,2:3}.keys() True
Steven D'Aprano wrote:
When I saw this I thought, "it should be like set(d1.values()) == set(d2.values())", but has been pointed out there's no guarantee that all values will be hashable. The hashability requirement for sets is, in a sense, an implementation detail. It might be a requirement for sets in Python the language, but its not a requirement for abstract "sets of values". E.g. Java includes a standard TreeSet which doesn't require hashability https://docs.oracle.com/javase/7/docs/api/java/util/TreeSet.html In this case, they need to be multisets, since {'a': 1, 'b': 2, 'c': 1}.values() != {'a': 1, 'b': 2, 'c': 2}.values() After that I have no expectation since order isn't guaranteed. I think this is one of those cases where it's superficially surprising when you don't think about all the ramifications, but once you understand the complexity of the problem then it becomes more clear that it isn't straight-forward. Nobody said it was straight-forward, particularly if we want guaranteed efficient comparisons in both time and space at the same time. Brett, I feel that you are dismissing this thread as "not thinking
On Wed, Jul 24, 2019 at 05:30:19PM -0000, Brett Cannon wrote: through the ramifications" without reading it through, because I'm pretty sure that we have thought through the ramifications in a lot more detail than your dismissal justifies.
Sorry, I didn't explicitly state the perspective to read that statement from. I'm not saying the people participating here don't understand the ramifications which have been brought up (which I have read top-to-bottom). My point is people who are **not** reading this thread may be surprised if they try this (i.e. users out in the wild which was the perspective I meant to convey), but **if** they are brought to understand the complexity required to make their assumption work then I would hope they would understand why things work the way they do.
Let's start with the minimal change we have suggested: that two views should be considered equal if they both belong to the same dict. assert d.values() == d.values() Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is correct?
Yes I do if we aren't going to make this work regardless of actual dict objects because if you don't make this work for `d1.values() == d2.values()` then I think that would make `d.values() == d.values()` more surprising as to why it works in one case where the values were all the same but not in another.
(By correct I mean in the technical sense that if we were writing a functional spec for views, we would actively desire two views of the same dict to be unequal.)
To me a doc update for dict.values() stating that the iterator can't be compared and a brief mention as to why would be the best solution for this. We're not talking about comparing iterators. We're talking about comparing views into a dict. That's a difference that makes all the difference.
You're correct that I misspoke, but I personally still think a doc change is the best solution.
Brett Cannon wrote:
You're correct that I misspoke, but I personally still think a doc change is the best solution.
I would agree that a doc change should occur if it is decided that the current behavior is appropriate, but I would like to mention that in the current [documentation for `object.__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), it states: "A rich comparison method may return the singleton `NotImplemented` if it does not implement the operation for a given pair of arguments". Wouldn't returning `NotImplemented` be far more explicit to the user, in terms to directly telling them that the equality assessment between two dictionary views is not implemented? In general, I find this to be far more informative than simply returning False. At a surface level, users may assume that False would imply that there was an actual assessment of equality being performed. This may not be an established precedent for other similar equality assessments, but I don't think the `NotImplemented` constant is utilized as much as it could be. It seems to be particularly well suited for addressing this situation.
25.07.19 22:05, Kyle Stanley пише:
I would agree that a doc change should occur if it is decided that the current behavior is appropriate, but I would like to mention that in the current [documentation for `object.__eq__()`](https://docs.python.org/3/reference/datamodel.html#object.__eq__), it states: "A rich comparison method may return the singleton `NotImplemented` if it does not implement the operation for a given pair of arguments".
Wouldn't returning `NotImplemented` be far more explicit to the user, in terms to directly telling them that the equality assessment between two dictionary views is not implemented? In general, I find this to be far more informative than simply returning False. At a surface level, users may assume that False would imply that there was an actual assessment of equality being performed.
This may not be an established precedent for other similar equality assessments, but I don't think the `NotImplemented` constant is utilized as much as it could be. It seems to be particularly well suited for addressing this situation.
NotImplemented is returned by the `__eq__` method, but the `==` operator returns False. Do not confuse them.
{}.values().__eq__({}.values()) NotImplemented {}.values() == {}.values() False
Actually, the `==` operator cannot return NotImplemented.
Serhiy Storchaka wrote:
Actually, the == operator cannot return NotImplemented.
Thanks for the clarification. What is the reason for this limitation and is it only possible for the `==` operator to return one of `None`, `False`, or `True`? It seems like it would be useful for it to be able to return `NotImplemented` in situations such as this. Also, I think that I may have had some misconceptions with regards to the relationship between the `__eq__()` method and the `==` operator. I know they are not the same, but isn't the result of the `==` operator based on a transformation of the result from `__eq__()`? As far as I can tell, the equality of two dictionary views are assessing used [`PyObject dictrich_compare`](https://github.com/python/cpython/blob/544fa15ea1b7b73068319bdb217b684e2fd7b...). Wouldn't it be possible to perform a conditional check if the view on the left side of the comparison is a values view and if so, use `Py_RETURN_NOTIMPLEMENTED`? Apologies if I'm completely off base here, my experience and understanding of the underlying C-API is quite limited. I've been programming with Python for quite some time, but I only started learning the C-API once I became interested in contributing to CPython.
On 7/25/2019 4:19 PM, Kyle Stanley wrote:
Serhiy Storchaka wrote:
Actually, the == operator cannot return NotImplemented.
Thanks for the clarification. What is the reason for this limitation and is it only possible for the `==` operator to return one of `None`, `False`, or `True`? It seems like it would be useful for it to be able to return `NotImplemented` in situations such as this.
Because no one is testing for it. And just using it in a boolean context will return True.
Also, I think that I may have had some misconceptions with regards to the relationship between the `__eq__()` method and the `==` operator. I know they are not the same, but isn't the result of the `==` operator based on a transformation of the result from `__eq__()`?
"Based on", yes. If all of the options return NotImplemented, it falls back on identity comparison. I can't find this in the Python 3 docs, but it's no doubt somewhere. Eric
25.07.19 23:19, Kyle Stanley пише:
Serhiy Storchaka wrote:
Actually, the == operator cannot return NotImplemented.
Thanks for the clarification. What is the reason for this limitation and is it only possible for the `==` operator to return one of `None`, `False`, or `True`?
The `==` operator can return any value except NotImplemented. But all implementations in the stdlib return only booleans. NumPy is the one famous exception.
It seems like it would be useful for it to be able to return `NotImplemented` in situations such as this.
It is the purpose of NotImplemented. It signals "ignore me, use other way to evaluate a result".
Also, I think that I may have had some misconceptions with regards to the relationship between the `__eq__()` method and the `==` operator. I know they are not the same, but isn't the result of the `==` operator based on a transformation of the result from `__eq__()`?
Yes, it is. And NotImplemented means that the result of `__eq__()` (or other dunder methods) should be ignored.
As far as I can tell, the equality of two dictionary views are assessing used [`PyObject dictrich_compare`](https://github.com/python/cpython/blob/544fa15ea1b7b73068319bdb217b684e2fd7b...). Wouldn't it be possible to perform a conditional check if the view on the left side of the comparison is a values view and if so, use `Py_RETURN_NOTIMPLEMENTED`?
How does it differ from the default implementation (`object.__eq__`)?
Kyle Stanley wrote:
Serhiy Storchaka wrote:
Actually, the == operator cannot return NotImplemented.
What is the reason for this limitation
It's not a limitation, it's a consequence of the way the operator machinery works. NotImplemented is used by operator methods to signal to the interpreter that it should take some alternative action. In this case, it will first try the other operand's __eq__ method, and if that returns NotImplemented as well, it assumes that the operands are not equal and returns False.
and is it only possible for the `==` operator to return one of `None`, `False`, or `True`?
No, it's possible for == to return almost anything (numpy arrays return an array of booleans, for example). It just happens that NotImplemented can't be returned, because it has a special meaning to the interpreter. -- Greg
On 7/25/2019 2:46 PM, Brett Cannon wrote:
You're correct that I misspoke, but I personally still think a doc change is the best solution.
Given the absence of a consensus on when values() views should be considered equal, I strongly agree. I strongly oppose raising an exception. -- Terry Jan Reedy
Terry Reedy wrote:
Given the absence of a consensus on when values() views should be considered equal, I strongly agree. I strongly oppose raising an exception.
I am with you regarding the strong opposition regarding the raising of an exception. I don't think that the `==` operator should raise an exception, doing so is excessively obstructive to the user. I'm not certain that returning `False` is the best behavior, but based on what I've gathered from the discussion so far there has been nothing suggested that would be a viable alternative. I had initially proposed returning `NotImplemented`, but upon further assessment, that would still end up returning `False` when using the `==` operator. As a result, leaving it as is and addressing the behavior in the docs seems to be the most appropriate solution.
Steven D'Aprano writes:
The hashability requirement for sets is, in a sense, an implementation detail. It might be a requirement for sets in Python the language, but its not a requirement for abstract "sets of values".
In this case, they need to be multisets, since
{'a': 1, 'b': 2, 'c': 1}.values() != {'a': 1, 'b': 2, 'c': 2}.values()
They don't *need* to be multisets. I would want a comparison of values views to be a comparison of images as sets in many cases. On the other hand, if I'm asking if two random variables have the same distribution, I would want a comparison of multisets. And for stochastic processes, I'd want a list, not a multiset. (Sorry for the technical jargon, there are probably similar examples from other, more realistic domains.) So I think I'm in David's camp (from __future__ import <CAEbHw4aZ--0t32ORbzVYb4PgYjFNN2=P9ooW_XPDxp-Yv=sY2w@mail.gmail.com>): we should inherit __eq__, and if we do anything more, we should provide functions that either do the comparisons correctly (i.e., generalizing set and multiset to non-hashable values), or very efficiently.
Let's start with the minimal change we have suggested: that two views should be considered equal if they both belong to the same dict.
assert d.values() == d.values()
Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*?
No, I don't. Do you think the proposed behavior (extending equality to views of the same dict, and only that) is *useful*? Steve
On Fri, Jul 26, 2019 at 08:41:40PM +0900, Stephen J. Turnbull wrote:
Steven D'Aprano writes:
The hashability requirement for sets is, in a sense, an implementation detail. It might be a requirement for sets in Python the language, but its not a requirement for abstract "sets of values".
In this case, they need to be multisets, since
{'a': 1, 'b': 2, 'c': 1}.values() != {'a': 1, 'b': 2, 'c': 2}.values()
They don't *need* to be multisets. I would want a comparison of values views to be a comparison of images as sets in many cases.
Under what circumstances would you expect two unordered collections of values: {1, 2, 3, 1, 1, 1} {1, 2, 3, 2, 2, 2} to compare equal? And do you really want that to be the default behaviour that everyone gets? Remember, too, we don't want the behaviour of values views to be too different from the behaviour of dicts, keys and items. I'm not saying that there's no possible scenario where we might want that. But it's probably going to be pretty specialised, and probably not suitable as the general purpose default behaviour. Ideally the average Python programmer should say "yeah, that behaviour makes sense", without having to follow it up with "... provided you have a degree in quantum chromodynamics and the data you are comparing represents solitons in a quark-gluon plasma".[1] Analogy: sometimes I want to do clock arithmetic, where 15 == 3, but that doesn't mean that I want int.__eq__ to default to clock arithmetic for my (occasional) benefit and everyone else's inconvenience.
On the other hand, if I'm asking if two random variables have the same distribution, I would want a comparison of multisets. And for stochastic processes, I'd want a list, not a multiset. (Sorry for the technical jargon, there are probably similar examples from other, more realistic domains.)
Comparisons as ordered sequences are easy: list(d1.values()) == list(d2.values()) Sets are trickier, because the values might not be hashable, but depending on your data this could work: set(d1.values()) == set(d2.values()) I don't think it adds much insight to the problem to discuss all the wide variety of specialist comparisons we might want to do in narrow circumstances. [...]
Let's start with the minimal change we have suggested: that two views should be considered equal if they both belong to the same dict.
assert d.values() == d.values()
Currently that assertion fails. Should it? Putting aside the convenience of "do nothing, just inherit the object.__eq__ behaviour" do you think that the current behaviour is *correct*?
No, I don't. Do you think the proposed behavior (extending equality to views of the same dict, and only that) is *useful*?
It's *less wrong* than the current behaviour i.e. it gets the comparison correct more often, even if it too sometimes returns False for values which people would expect to compare equal. [1] Any relationship between what I said and real physics is purely a coincidence :-)
Steve
Steven D'Aprano writes:
Under what circumstances would you expect two unordered collections of values:
{1, 2, 3, 1, 1, 1} {1, 2, 3, 2, 2, 2}
to compare equal?
As you've pointed out yourself, I believe, here we are not interested in generic unordered collections. We have views into a dictionary. A dictionary is one way to implement a function, and it's often interesting to know what the image of a function is. I would guess this is one reason why set(d.view()) came up early in the discussion. I hardly think that the range of values in a dict is something that Python programmers would need a PhD to understand, at least not if you present it as "set(d.values())", and optionally "but works if values aren't hashable". I'm -1 on any of the possibilities proposed to be anointed as "==" for dict.values without data on how often they are used in practice, and how often each is used in inner loops. All are easy to spell since dict.values is iterable (if you accept the restriction for set that all dict values be hashable). I would argue that if anything is to be implemented in built ins, it should be the set comparison, since that is the only one that doesn't have a simple spelling for generic dicts. (That doesn't mean I think it should be "==".)
No, I don't. Do you think the proposed behavior (extending equality to views of the same dict, and only that) is *useful*?
It's *less wrong* than the current behaviour i.e. it gets the comparison correct more often, even if it too sometimes returns False for values which people would expect to compare equal.
I already knew you cared that the minute hand of a stopped clock is right 24 times a day. I don't, and that's not what I asked. Steve
On 23.07.2019 23:59, Kristian Klette wrote:
Hi!
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just `False` seems a bit misleading.
What are your thoughts on the issue?
Since a hash table is an unordered container and keys(), items() and values() are iterators over it, *I would expect the results of any of the comparisons to be undefined.* While Python 3 dicts preserve order, it's _insertion_ order, so there's no guarantee that two equal dicts will iterate over the items in the same order. *Comparing the sequences that iterators produce goes beyond the job of iterators* (i.e. requires iterating from start to finish under the hood and constructing temporary containers), so it shouldn't be a part of their functionality. The implemented comparisons were specifically intended as convenience methods that go beyond the job of an iterator AFAICS but they fell short since in the case of `values()`, the task turned out to have too much computational complexity. So *the comparion logic, if kept at all, should be reduced to comparing things that are intrinsic to iterators themselves:* whether they point to the same object and have the same internal state. If someone needs to specifically compare the resulting sequences, they should use separate logic dedicated to comparing sequences -- with the associated quadratic complexity et al if then need to compare regardless or order. *The current logic encourages using iterators for things they aren't designed for so it's actively confusing and harmful.*
[0]: https://bugs.python.org/issue37585 [1]: https://github.com/python/cpython/pull/14737 _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/R2MPDTTM...
-- Regards, Ivan
On Fri, Jul 26, 2019, at 00:22, Ivan Pozdeev via Python-Dev wrote:
Since a hash table is an unordered container and keys(), items() and values() are iterators over it, *I would expect the results of any of the comparisons to be undefined.*
keys, items, and values are not iterators. They are view objects, and keys and items implement the entire set API (in the same sense that range implements the entire sequence API). Values is the odd one out in that it can contain multiple instances of each object, so it can't really be considered as a set. Items also sometimes contains unhashable types, and some methods simply fail in that case. I suggest that this precedent provides a way forward - implement the entire intuitive "contains the same amount of each value" algorithm [more or less Counter(obj1) == Counter(obj2)], and have this fail naturally, throwing e.g. an exception "TypeError: unhashable type: 'list'" if any of the values are unhashable in the same way that trying to perform certain set operations on an items view does.
On Fri, Jul 26, 2019 at 2:03 PM Random832 <random832@fastmail.com> wrote:
Items also sometimes contains unhashable types, and some methods simply fail in that case. I suggest that this precedent provides a way forward - implement the entire intuitive "contains the same amount of each value" algorithm [more or less Counter(obj1) == Counter(obj2)], and have this fail naturally, throwing e.g. an exception "TypeError: unhashable type: 'list'" if any of the values are unhashable in the same way that trying to perform certain set operations on an items view does.
-1. What is the motivation of this? In this case, I don't think "I found missing parts so I want to implement it for consistency" is not enough reason to implement it. I want a real-world application which requires it. Without a strong use case, I think the discussion is just wasting time. Regards, -- Inada Naoki <songofacandy@gmail.com>
I want a real-world application which requires it. Without a strong use case, I think the discussion is just wasting time.
I would have to agree. Initially I was in support of changing the behavior, but upon reading the responses of several core developers and further consideration, the most appropriate course of action seems to be updating the docs. I have not seen any relevant applications where it would be useful to compare the values view between dictionaries, but I agree that the behavior of returning `False` might be confusing without any mention of it. I [opened a PR](https://github.com/python/cpython/pull/14954) which mentions this behavior in the relevant documentation, but further explanation of why this occurs might be appropriate. I'm not certain as to whether or not further explanation is needed in this situation though.
26.07.19 08:27, Inada Naoki пише:
On Fri, Jul 26, 2019 at 2:03 PM Random832 <random832@fastmail.com> wrote:
Items also sometimes contains unhashable types, and some methods simply fail in that case. I suggest that this precedent provides a way forward - implement the entire intuitive "contains the same amount of each value" algorithm [more or less Counter(obj1) == Counter(obj2)], and have this fail naturally, throwing e.g. an exception "TypeError: unhashable type: 'list'" if any of the values are unhashable in the same way that trying to perform certain set operations on an items view does.
-1. What is the motivation of this? In this case, I don't think "I found missing parts so I want to implement it for consistency" is not enough reason to implement it.
I want a real-world application which requires it. Without a strong use case, I think the discussion is just wasting time.
Completely agreed.
On Fri, Jul 26, 2019 at 12:57:42AM -0400, Random832 wrote:
On Fri, Jul 26, 2019, at 00:22, Ivan Pozdeev via Python-Dev wrote:
Since a hash table is an unordered container and keys(), items() and values() are iterators over it, *I would expect the results of any of the comparisons to be undefined.*
keys, items, and values are not iterators. They are view objects, and keys and items implement the entire set API (in the same sense that range implements the entire sequence API). Values is the odd one out in that it can contain multiple instances of each object, so it can't really be considered as a set.
But it can be considered a multiset. In plain English, a set can contain duplicates. If I have a set of some collectable item (say, trading cards), or a dinner set, they can contain duplicates. We shouldn't make too much of the fact that Python sets collapse multiples of a value down to one. If we wanted a multiset, we could get one. collections.Counter is already a multiset of sorts. Nor should we make too much of the fact that Python sets require elements to be hashable. As Java TreeSet demonstrates, we could get an efficient set of unhashable items if we required orderability; and we can get sets of unhashable, unorderable items if we're willing to compromise on efficiency.
Items also sometimes contains unhashable types, and some methods simply fail in that case. I suggest that this precedent provides a way forward - implement the entire intuitive "contains the same amount of each value" algorithm [more or less Counter(obj1) == Counter(obj2)], and have this fail naturally, throwing e.g. an exception "TypeError: unhashable type: 'list'" if any of the values are unhashable in the same way that trying to perform certain set operations on an items view does.
Equality tests really ought not to fail. If they do fail, it should be considered a bug in the __eq__ method, not an intentional result. To allow == tests to fail is just a way of sneaking in a three-value logic into the language, only using an extremely inconvenient API: try: if a == b: print(True) else: print(False) except Exception: print(Maybe) # or undecidable, unknown, mu, etc. Multi-value logics usually model the complexities of the real world much better than boolean logic, but the assumption of boolean logic and the law of the excluded middle is too prevalent to mess with in the builtins. http://mathworld.wolfram.com/LawoftheExcludedMiddle.html https://en.wikipedia.org/wiki/Three-valued_logic -- Steven
Steven D'Aprano wrote:
Equality tests really ought not to fail. If they do fail, it should be considered a bug in the __eq__ method, not an intentional result.
To allow == tests to fail is just a way of sneaking in a three-value logic into the language, only using an extremely inconvenient API:
In the case being considered here, I would argue that attempting to compare dict.values() results is a symptom of a bug in the code performing that comparison, or at least a smell suggesting that the programmer hasn't thought something through properly. The remedy is to re-write that code to be explicit about what is really wanted. There is no three-valued logic involved here. -- Greg
On Fri, Jul 26, 2019, at 05:45, Steven D'Aprano wrote:
Nor should we make too much of the fact that Python sets require elements to be hashable. As Java TreeSet demonstrates, we could get an efficient set of unhashable items if we required orderability; and we can get sets of unhashable, unorderable items if we're willing to compromise on efficiency.
And think of what we could do if we were willing to compromise on immutability of hashable objects (Java does, in general, make the opposite decision there)
Random832 wrote:
implement the entire intuitive "contains the same amount of each value" algorithm [more or less Counter(obj1) == Counter(obj2)],
But then we'd be guessing that this particular interpretation of "dict values equality", out of several plausible ones, is the one the programmer intended. And we know what the Zen has to say about guessing. -- Greg
Kristian Klette schrieb am 23.07.19 um 22:59:
During the sprints after EuroPython, I made an attempt at adding support for comparing the results from `.values()` of two dicts.
Currently the following works as expected:
``` d = {'a': 1234}
d.keys() == d.keys() d.items() == d.items() ```
but `d.values() == d.values()` does not return the expected results. It always returns `False`. The symmetry is a bit off.
In the bug trackers[0] and the Github PR[1], I was asked to raise the issue on the python-dev mailing list to find a consensus on what comparing `.values()` should do.
I'd argue that Python should compare the values as expected here, or if we don't want to encourage that behaviour, maybe we should consider raising an exception. Returning just `False` seems a bit misleading.
What are your thoughts on the issue?
FWIW, after reading most of this thread, I do not like the idea of raising an exception for an innocent comparison. Just think of a list of arbitrary objects, including a dict values view for some reason, and you're looking for the right object in the list. Maybe in some kind of generic tool, decorator, iter-helper, or whatever, something that has to deal with arbitrary objects provided by random users, which uses "in" instead of a loop with "is" comparisons. I also kind-of like the idea of having d.values() == d.values() return True and otherwise let the comparison return False for everything else. This seems to be the only reasonable behaviour that might(!) have a use case, maybe in the same line as the argument above. I can't really see a reason for implementing anything more than that. Stefan
participants (21)
-
Antoine Pitrou
-
Brett Cannon
-
David Mertz
-
Eric V. Smith
-
Greg Ewing
-
Inada Naoki
-
Ivan Pozdeev
-
Jeff Allen
-
Kristian Klette
-
Kyle Stanley
-
MRAB
-
Petr Viktorin
-
Random832
-
Rob Cliffe
-
Ronald Oussoren
-
Serhiy Storchaka
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steve Holden
-
Steven D'Aprano
-
Terry Reedy