Fwd: Why do equality tests between OrderedDict keys/values views behave not as expected?
data:image/s3,"s3://crabby-images/e2dbc/e2dbc0e63f46a6542a6bbf89e4a128181cbc29dc" alt=""
Hi, I recently compared OrderedDict instances while writing unit tests, and discovered an interesting behavior. If I create two ordered dictionaries with the same keys/values in the same order, I observe that their values are not equal when I compare them. I recently asked a question about this on Stackoverflow: http://stackoverflow.com/questions/34312674/why-values-of-an-ordereddict-are... Moreover, another user observed that keys of ordered dictionaries are compared in an order insensitive way: http://stackoverflow.com/questions/34320600/why-does-the-ordereddict-keys-vi... Are there any reasons for such implementation choices? As it appears disturbing for many people, would it be possible to update these behaviors? Best Regards, Alexandre.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
So the issues are: 1. OrderedDict().values() does not implement __eq__. It uses object equality, which means identity. 1a. dict().values() does not implement __eq__. 2. OrderedDict().keys().__eq__ does not respect order. I'd argue that keys() should be ordered comparisons, and values() could be. As a plus, this could be more efficient than unordered comparisons, since it's just return all(x == y for x, y in zip(self.keys(), other.keys())) instead of packing each into a set and comparing the sets. But what would be the point of comparing values views? On the other hand, I guess dict().values().__eq__ should stay unimplemented. What would it mean? MultiSet comparison? Values in general aren't even hashable, so they can't be stuck into a hash-based set structure. Maybe explicitly make it NotImplemented. On Thu, Dec 17, 2015 at 4:40 AM, Alexandre Figura <ipipomme+python@gmail.com> wrote:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Dec 17, 2015, at 03:19, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
So the issues are:
I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.) As it stands, you can guess: * since all mapping keys and items act like sets (and are Sets), they probably compare like sets * since there is no OrderedSet for OrderedDict's keys and values to act like, they probably don't compare like those and instead just compare like sets * since values are just generic collections, they probably just use generic identity comparison. But, even if that guess may happen to be right for all builtin and stdlib types, and all user types that rely on the ABCs as mixins, in all of the major implementations, it's still just a guess, not something I'd want to rely on in portable code.
So what happens when you compare the keys, items, or values view from an OrderedDict against the view from another mapping type? Or, for keys and items, against another set type? If you leave that up to the whichever one is on the left, you get cases where a==b and b!=a. If you leave it up to the most derived type (by the usual __rspam__-like rules), that doesn't help anything, since dict_keys_view and odict_keys_view are unrelated except in sharing an abstract base class. And, worst of all, even if you contrive a way for one or the other to always win consistently, you get cases where a==b and b==c and a!=c. Also, overriding equality for ordered dict views might make sense, but what about ordering? I think it's pretty reasonable to expect that something that acts like a set and inherits from Set use < to mean subset, not lexicographical comparison. But then the interactions with other mapping views get even crazier. Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement. Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons? Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code?
I think you want zip_longest(self.keys(), other.keys(), fill=object()) or equivalent; otherwise {1, 2} will be equal to {1, 2, 3} . Also, you don't need to pack self.keys() into a set; it already is a Set, and implements all of the immutable set operations as-is, without needing any conversions. But really, who cares which of two implementations is more efficient when they implement two completely different things? If set comparison is right, it doesn't matter that sequence comparison is faster, you still can't use it, and vice-versa.
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Thu, Dec 17, 2015 at 8:57 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.)
The views are documented as "set-like" and tied to the Set ABC. [1] So comparision operations should match the same semantics as for sets. [snip]
Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons?
Yep. If you want to work with a sequence then you have to convert the OrderedDict to a list or other sequence. The same goes for the views, which *do* preserve order during iteration (e.g. "list(od.keys())").
Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code?
I've spoken with folks that have use cases for OrderedDict-as-a-sequence, but they were always able to use something else that met their needs more directly anyway. -eric [1] https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
* since all mapping keys and items act like sets (and are Sets), they probably compare like sets
.items() aren't like sets. Or something is very wrong.
If OrderedDict views raised NotImplemented, I believe the other view will then have the chance to try its own comparison.
Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement.
Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons?
OrderedDict itself does everything that you might not want its views to do. OrderedDict implements order-sensitive comparison. It breaks all the rules you can think of that order-sensitive comparisons can break. Transitivity is already a problem. "Who's comparing?" is already a problem, and it's made its choice ("Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries."). The question now is, should views be made consistent with the OrderedDict itself? There are three options: 1. Deprecate the OrderedDict ordered comparison, to make it consistent with the views. 2. Make the views consistent with the OrderedDict. 3. Do nothing.
I actually want return len(self.keys()) == len(other.keys()) and all(x == y for x, y in zip(self.keys(), other.keys())) This should be more efficient than set comparisons. It's a bonus, not a reason.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
So I think that not using the __eq__ method of the keys or values is wrong (dicts do use it), but there's a philosophical question: if two OrderedDicts have the same key/value pairs in a different order, should they be considered equal or not? (Only when we've answered this can we answer the question about keys/values/items). I'm not a frequent user of OrderedDict, so I don't have a good intuition, unfortunately. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
(Note: This is already the case. https://docs.python.org/3/library/collections.html#collections.OrderedDict """ Equality tests between OrderedDict objects are order-sensitive and are implemented as list(od1.items())==list(od2.items()). Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries. This allows OrderedDict objects to be substituted anywhere a regular dictionary is used. """ So you're asking whether to deprecate this behavior?) On Thu, Dec 17, 2015 at 2:55 PM, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/a2433/a2433046d4b9f61fe876ee9fd7ae3c7add11c90b" alt=""
I may or may not have gotten this right, but I think he's asking "should that be the case?", not "is that the case?" - It is the case, but should it? I believe so. For one of my projects, I have a more complex mapping type, which under the hood is just an ODict that I do various operations on, and sometimes delegate to - for example, for equality tests against another instance of the same class, I just compare the underlying ODs. In that last case, I most certainly don't want two non-equal classes to compare equal! I'll ask the obvious question instead: "Should two lists compare unequal if they have the same items but not the same order?" However, the answer you want is "If I care about insertion and iteration order, there's a good chance I also care about that order when comparing, too" If you explicitly don't care about the order, it's quite easy to do so:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Thursday, December 17, 2015 11:50 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Yes they are. Look at the library documentation for dict and dict views in stdtypes, and in collections.abc. An items view is supposed to be set-like, and to be actually usable as a set if the values are hashable. (If the values aren't hashable, obviously most set operations are just going to raise.) And, as far as I can tell, nothing is very wrong.
Yes, but the proposal here is for OrderedDict and its views to implement something sequence-like, not to raise NotImplemented, so why is that relevant?
Which is something equivalent--the only way that can return different results is if self.values() contains elements that declare themselves to be equal to everything, at which point they already break too many invariants to safely put in a container.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Thu, Dec 17, 2015 at 4:34 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Hm. >>> x = {0: []} >>> y = {0: []} >>> x == y True >>> x.items() == y.items() True That last one doesn't seem set-like to me. But it seems I misunderstood what you were saying. Looking at the source code for ItemsView, "containment" is defined as "other[0] in self and self[other[0]] == other[1]". So yes, it's set-like, in that it checks for containment. I've just never thought about "containment of a key => value mapping". (It's funny, 'cause I've tried to develop the exact same idea in a different subject.)
I mean for them to raise NotImplemented in the case of "the other dict is not an instance of OrderedDict". Anyway, this is all a moot point. *If* they were to do something different from dict's views, then they should follow OrderedDict.__eq__. PS: To extend "is an OrderedDict a sequence?" in the wrong direction, I'll point out that OrderedDict.sort(key=None) has a clear and natural meaning, and the implementation should be obvious (relative to a given ODict implementation), for either a linkedlist or array ordering. And to go even further: OrderedDict.items().sort(key=None) also makes some sense. (Though why you would want to compare with respect to values...)
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Thursday, December 17, 2015 2:19 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Sure it's set-like
Yeah, that's the point: items views are effectively sets of items, which are key-value pairs. What else would they be sets of? Because values aren't necessarily hashable, it's unavoidable that some set operations may raise a TypeError (if the values aren't hashable) instead of doing what you'd want. But that doesn't mean all set operations that could possibly raise a TypeError always do so; that only happens when it's unavoidable. And here, it's avoidable. I suppose you could argue that this is a bit too clever to just assume without documenting, and another Python implementation might well have a dict items view that just directly tried hash(self) and raised here, so you really can't write any code that depends on this behavior. Maybe that's true. But that still doesn't seem like a problem with CPython's implementation; the question would just be how to change the docs (whether to require other implementations to do the same thing or to explicitly allow them to raise).
1. OrderedDict().values() does not implement __eq__. It uses
Sure; any other option is terrible. In fact, both of these options are terrible--one breaks consistency with other mappings, and with the basic rules of comparison; the other breaks consistency with the other related types. Ick. Then again, they both have compelling arguments for them, and they're both pretty simple. I doubt anyone has any critical code that relies on the current behavior, but then I doubt anyone would write any critical code that relied on the other behavior if it were changed. To me, it looks like the best deciding factor is inertia. If it's worked the same way for years, it might as well keep working that way. Maybe add a note in the docs saying you shouldn't compare the things, especially to other mapping types' views, and what will happen if you do?
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 02:38:53AM +0000, Andrew Barnert via Python-ideas wrote: [...]
This thread has wandered over a fair bit of ground, so I've lost track of *precisely* what these options are. I think we're still debating the fact that OrderedDict *values* compare by ID (like arbitrary objects), rather than by value like items and keys. For example: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = a.copy() py> a.keys() == b.keys() True py> a.items() == b.items() True So KeysView and ItemsView compare by the value of the view. But ValuesView compare by ID, not value: py> a.values() == b.values() False This makes no sense to me. The same counter-intuitive behaviour occurs for dict ValuesView as well: py> dict(a).values() == dict(b).values() False I think that if a and b are the same Mapping type (say, both dicts, or both OrderedDicts), then there is an obvious and logical invariant(s). The following three expressions should be equivalent: (1) a == b (2) a.items() == b.items() (3) (a.keys() == b.keys()) and (a.values() == b.values()) ignoring the usual suspects like NANs and other "funny stuff". Am I missing something? Is there a rationale for ValuesView to compare by identity instead of value? It's not just equality that behaves strangely with ValuesView. Even when the values are unique and hash-like, they don't behave very "set-like": # KeysView and ItemsView are set-like py> a.keys() & b.keys() {'b', 'a'} py> a.items() | b.items() {('b', 2), ('a', 1)} # values are hashable, but ValuesViews are not set-like py> set(a.values()) | set(b.values()) {1, 2} py> a.values() | b.values() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for |: 'ValuesView' and 'ValuesView' This is *especially* weird when one realises that ItemsView actually manages to be set-like even when the values are not hashable: py> c = odict(x=[]) py> c.items() & {} set() yet ValuesView isn't set-like when they are! -- Steve
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
ValuesView is not a set because there may be duplicates. But the identity thing feels odd. (Even though I designed this myself.) Maybe because values may not be comparable? --Guido (mobile)
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 08:37:09AM -0800, Guido van Rossum wrote:
Right, that makes sense now, and it's even documented that value views are not treated as sets: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects I'm not sure what you mean by "values may not be comparable"? Since we're only talking about equality, aren't all values comparable? -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 9:34 PM, Steven D'Aprano <steve@pearwood.info> wrote:
See my example in the other email (https://mail.python.org/pipermail/python-ideas/2015-December/037498.html). That's a case where the order of comparison matters, so you can't do a conceptual "unordered comparison" without, in the worst case, comparing everything to everything else. This is due to custom __eq__ (by OrderedDict, for irony's sake): a == b and b == c does not mean a == c.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 10:28:52PM -0500, Franklin? Lee wrote:
I don't know what Guido means by "values might not be comparable", but your example is lack of transitivity. Mathematical equality is transitive: if a == b, and b == c, then a == c. But that doesn't extend to non-numeric concepts of equality, e.g. preference ordering, or other forms of ranking. Since Python __eq__ can be overridden, we cannot assume that equality of arbitrary objects is necessarily transitive. And indeed, even with Mappings they are not: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = dict(a) py> c = odict([('b', 2), ('a', 1)]) py> a == b == c True py> a == c False -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 10:57 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Well, that's my point AND my example. If this were lists, then a lack of transitivity of elements would mean a lack of transitivity for lists of those elements. You get what you put in. But since dict views are unordered collections, a lack of transitivity for elements would mean INCONSISTENCY: the comparison of two views as multisets would depend on the exact order of comparison. Unless you are willing to compare everything to everything else in the worst case.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 8:07 PM, Franklin? Lee < leewangzhong+python@gmail.com> wrote:
Honestly, I think it's fine if an operation like == for a collection uses an algorithm that just assumes the items' comparison is transitive, and if it isn't, you still get what you put it (i.e. it's still like a sewer :-). It's the same with sort() -- if the comparison is unreasonable the sort still terminates, just not with the items in sorted order. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 7:57 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I don't know what Guido means by "values might not be comparable", but your example is lack of transitivity.
I mean cases where == actually raises an exception. In Python 2 we did this for comparing certain 8-bit strings to unicode strings. While we've fixed that particular issue (in 3, bytes(...) == str(...) always returns False), it's still possible, and in fact even some stdlib modules do this -- I know certain comparisons of naive and tz-aware datetimes do this (which is not the same as returning NotImplemented). However for full disclosure I should add that until just now I had misunderstood the complaint about values() -- it doesn't compare the values by identity, values views themselves are only compared by identity. But (how's this for a nice recovery :-) that's how comparing dict.values() works, and it's reasonable given how expensive it would be to make it work otherwise (the corresponding ABCs behave the same way). The real oddity is that an OrderedDict's keys and items views don't take order into account, even though comparing the OrderedDict objects themselves does use the order. This seems to be laziness of implementation (just inheriting most of the view implementation from dict) rather than based on some careful design consideration. Unless I missed the consideration (I wasn't involved in the design of OrderedDict at all TBH). (And FWIW if we do fix this I might be amenable to changing the way values views are compared, but *not* by taking the keys into account. It's either by identity of the view object, or by comparing the values in order.) -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Steven D'Aprano writes:
I don't know what Guido means by "values might not be comparable",
Guido evidently meant something more pragmatic, but as I see it, values of different types are in principle incomparable, and the odict == dict == odict example shows why: satisfying the equality definitions of different types simultaneously is often impossible. But if you transform them to a common type or unify them in a union type, I think it's reasonable to expect that the common type will implement equality as an equivalence relation. (As you indirectly mentioned yourself, that's why we accept objects like NaN only as a last resort in preserving compatibility with a truly important standard we can't change.) In Python, the TOOWTDI common type for fallback is object, and that works well enough as a default implementation of __eq__ (ie, "is") that allows for equality comparison across types, which can be useful.
but your example is lack of transitivity.
Indifference (the notion of "equality" that applies to preference, at least in economics) in practice is always an equivalence; all use cases I know of require this (many strengthen it to actual equality, ie, requiring antisymmetry). I can't think of a case where I would consider any equality relation deliberately defined as not an equivalence to be anything but perverse. So I'm not sure what you're trying to say here. I guess you mean that as a pragmatic matter, a programming language may allow perverse definitions, and of course there may be artifacts of particular definitions such that objects of types that should always be considered unequal might compare equal. I suppose there are "consenting adults" cases where it's convenient for a particular use case to use a perverse definition. And yes, I consider "fall back to the less restrictive definition of equality", as done by OrderedDict and dict in the example, to be perverse.
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Fri, Dec 18, 2015 at 3:07 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This makes no sense to me. The same counter-intuitive behaviour occurs for dict ValuesView as well:
OrderedDict re-uses dict's view types (but overrides __iter__). So it had better be the same behavior! :) -eric
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Dec 18, 2015, at 03:07, Steven D'Aprano <steve@pearwood.info> wrote:
It's not just equality that behaves strangely with ValuesView. Even when the values are unique and hash-like, they don't behave very "set-like":
But values views are inherently multisets, not sets. Do you really want to say that the multisets {2, 3, 3} and {2, 2, 3} are equal because they have the same elements, even though they have different counts? And is {1, 1, 2} & {1, 1, 3} really just {1} rather than {1, 1}? For some uses of multisets, those rules make sense, but in general, they don't. (Check out how Counter handles the same questions.) If we had a general notion of multisets in Python, it might make sense to define values views and their behavior in terms of multisets. But defining them in terms of sets because that's sort of close and we have them doesn't make any sense. If you're thinking we could define what multisets should do, despite not having a standard multiset type or an ABC for them, and apply that to values views, the next question is how to do that in better than quadratic time for non-hashable values. (And you can't assume ordering here, either.) Would having a values view hang for 30 seconds and then come back with the answer you intuitively wanted instead of giving the wrong answer in 20 millis be an improvement? (Either way, you're going to learn the same lesson: don't compare values views. I'd rather learn that in 20 millis.)
Try this: c = {1: []} d = {1: [], 2: []} c.items() < d.items() It can tell that one "virtual set" of key-value pairs is a subset even though they can't actually be represented as sets. The items views take "as set-like as possible" very seriously.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 5:03 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Counter would require hashable values. Any efficient multibag concept, in fact, would. Quadratic multibag comparisons would run into trouble with custom equality. # Pretending that kwargs is ordered. a0 = dict(x=0, y=1) a1 = a0 b0 = OrderedDict(x=0, y=1) b1 = OrderedDict(y=1, x=0) d0 = {'foo': a0, 'bar': b0} d1 = {'foo': b1, 'bar': a1} If we compare a0 == a1 and b0 == b1, then it fails. If we compare a0 == b1 and b0 == a1, then it passes. The order of comparisons matter. I see two options: - comparison is explicitly NotImplemented. Any code that used it should've used `is`. - comparison respects keys. OrderedDict values() comparison makes some sense, but its options would be - comparison is sequential. - comparison respects keys.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote:
We're still talking about equality between Mapping.values() views, correct? I strongly dislike that option. I don't know of any std lib or built-in object which tries to prohibit equality comparisons. Of course your own custom classes can do anything they like, including rather silly things: class Silly: def __eq__(self, other): x = random.choice([0, 1, 2]) if x == 2: raise ValueError return bool(x) but I think that people expect that equality tests should always succeed, even if it falls back on the default object behaviour (namely identity comparison).
- comparison respects keys.
I'm not sure I understand what this means. If we have two dicts: a = {1: None, 2: None} b = {1: None, 3: None} are you suggesting that `a.values() == b.values()` should return False because the KEYS {1, 2} and {1, 3} are different? "ValuesViews implement equality by comparing identity, because everything else is too hard" might not be useful, but at least it makes sense as an explanation. Whereas "ValuesViews implement equality by comparing keys" sounds like something I might have read in "PHP, A Fractal Of Bad Design" :-) -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 10:43:24PM -0500, Franklin? Lee wrote:
Franklin, could you please try to be a little bit more explicit about what "it" and "this" is when you describe something? I find it very hard to understand what *specific* thing you are referring to when you refer to it using short-hand. You say that "this" is a silent error waiting to happen. Does "this" refer to your suggestion that "comparison is explicitly NotImplemented", or something else? I can't tell.
Second, "NotImplemented" allows the other side to try its __eq__.
Okay, that makes better sense. I tought you meant that Mapping.ValuesView *did not implement* an __eq__ method, so that it raised an exception if you tried to compare them: # What I thought you meant a.values() == b.values() => raises an exception Now I understand that you mean they should return NotImplemented instead. But in practice, that doesn't actually change the behaviour that much: if both sides return NotImplemented, Python will fall back to the default behaviour, which is identity. py> class A: ... def __eq__(self, other): ... return NotImplemented ... py> a = A() py> b = A() py> a == a True py> a == b False Returning NotImplemented just allows the other argument a chance to be called, but that already happens! py> class Spam: ... def __eq__(self, other): ... print("calling Spam.__eq__") ... return True ... py> a = odict() py> b = Spam() py> a == b calling Spam.__eq__ True py> b == a calling Spam.__eq__ True So no change there. We already have that behaviour. -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 11:10 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Sorry, bad habit. I see dict().values().__eq__ as an error.
I didn't realize this. Since explicitly raising NotImplemented would cause it NOT to try the reflected method, I withdraw the suggestion of making the comparison fail in any way that it doesn't already.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 11:58 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Sorry, bad habit. I see dict().values().__eq__ as an error.
I mean that I see comparison of dict values views as an error. That it shouldn't have been done. But I can see where it would be used. So I withdraw that, too.
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Andrew Barnert via Python-ideas writes:
I don't think this is an appropriate argument. I don't check every computation my Python programs do. There's a good chance it would take years to learn not to compare values views because it's sometimes wrong (eg, if in my use case most of the time view_a is view_b or not equal according to my desired definition). OTOH, I do check my watch many times a day, and am very protective of my time; I would notice a 30-second hang in that case. Speaking of 30-second hangs, I really wish you'd trim (the alternative is to stop reading your posts, and I really don't want to do that either). If you won't trim, please top-post, at least for those posts which have S/N ratios on the order of 10^-2 (which have been frequent recently).
data:image/s3,"s3://crabby-images/980d1/980d1e4a110b86a06fe535e4d8377768d2e2398b" alt=""
Andrew Barnert writes:
Why aren't all values hashable? When I think about it, it's a bit strange that "hashable" is so wrapped up in immutability for no other reason than the idea that it's not safe to use a mutable object as a dictionary key if it is changed while it is in the dictionary. Java doesn't have this restriction. And while the reasoning is certainly defensible in isolation, it sort of goes against the "consenting adults" principle that is used to justify all the _other_ dangerous/questionable things that Python doesn't bother putting technical obstacles in front of. Why not demand that all objects (except NaN?) can be hashed, and that their hash shall match the equality relationship they define, and that all objects can safely be used as set members and dictionary keys so long as they are not in fact changed while in such a position? Certainly we can't put technical restrictions in the way of defining a __hash__ method that raises an exception, but we can say that if they try to use such an object in a dictionary and compare their values views they're on their own. For a somewhat more conservative path forward, define a new method __vhash__ which will always return a value (by default based on the object identity), and __hash__ shall return either the same number as __vhash__, or raise an exception if the object is not guaranteed as "immutable" for the purpose of equality comparison.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 10:22 PM, Random832 <random832@fastmail.com> wrote:
The link between hashing and immutability is because objects whose hash would change are common, e.g. lists, and using them as dict keys would be very hard to debug for users most likely to make this mistake. The issue is that the dict implementation makes it impossible to find back keys whose hash has changed, other than by linear search, which is unacceptable -- but that's exactly what users will try to debug such issues, i.e., print the dict and notice that the missing key is indeed present. The consenting adults rule typically applies to things that are well hidden or marked (e.g. using __dunder__ names). There are plenty of things that Python could allow but doesn't, not because they are hard to implement or would violate an invariant of the interpreter, but because they could trip over naive users. Note that you are turning things upside down: the question "why aren't all things hashable" came about because Andrew was considering making a hash table of the values of a dict. But the real question here isn't "why aren't all things hashable" but "why can't you put mutable values into a set". The answer to the latter is because when we put a value into a container, and later the value changes, we can't tell the container, so if the container has any sort of lookup scheme other than linear search, it would lose track of the value. Hashing comes into play because all of Python's common data structures use hashing to optimize lookup -- but if we used a different data structure, e.g. something based on sorting the keys, we'd still have the mutability problem. And we'd have worse problems, because values would have to be sortable, which is a stricter condition than being immutable. In any case, you can't solve this problem by making all values hashable. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Guido van Rossum <guido@python.org> writes:
That was a great explanation; you answered several points on which I was vague, and you addressed some things I didn't even know were problems. I'd love to see that edited to a blog post we can reference in a single article, if you have the time. -- \ “I went to the museum where they had all the heads and arms | `\ from the statues that are in all the other museums.” —Steven | _o__) Wright | Ben Finney
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Hah, I never have time to blog any more. :-( However you can just link to the mailman archives if you want to reference it. On Sat, Dec 19, 2015 at 12:16 AM, Ben Finney <ben+python@benfinney.id.au> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/980d1/980d1e4a110b86a06fe535e4d8377768d2e2398b" alt=""
Guido van Rossum writes:
Java doesn't seem to have this problem. Python uses dicts more heavily as part of its core architecture, sure, but those dicts use strings as their keys.
The consenting adults rule typically applies to things that are well hidden or marked (e.g. using __dunder__ names).
The ability to e.g. replace a class or module's functions, or values intended as constants, is not especially well-hidden.
Well, sure, but that's a reasonable way (if the ability to do so were present) to implement the operation being discussed under the performance constraints he specified.
Yes, but you're fine as long as the value doesn't change. What do you think about my __vhash__ idea? Someone would only make sets/dicts that use __vhash__ rather than __hash__ if they can guarantee the object won't change in the lifetime of its presence in the container (something that's no problem for the short-lived container that would be used for this operation)
Sure I can. Normal dict values: def __eq__(self, b): return Counter(self) == Counter(b) #or e.g. Counter(map(self, make_vhash_key)) ... OrderedDict values: def __eq__(self, b): if isinstance(b, OrderedDict) return List(self) == List(b) else: return super().__eq__(b) # Yes, this isn't transitive, i.e. maybe: # a == c and b == c where a != b # but the same is true today for the dicts.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Dec 19, 2015 at 5:46 AM, Random832 <random832@fastmail.com> wrote:
You can solve this without adding warts to the language by using a wrapper object.
I don't see what that bit of code proves (or most other things you wrote above). -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Seems like we dropped the ball... Is there any action item here? -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/e2dbc/e2dbc0e63f46a6542a6bbf89e4a128181cbc29dc" alt=""
If we put technical considerations aside, maybe we should just ask to ourselves what behavior do we expect when doing equality tests between ordered dictionaries. As a reminder:
So, it appears that: 1. equality tests between odict_values use objects identity and not equality, 2. equality tests between odict_keys do not respect order. If it is not technically possible to change the current implementation, maybe all we can do is just add a warning about current behavior in the documentation? On Mon, Jan 11, 2016 at 4:17 AM, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/2dd36/2dd36bc2d30d53161737124e2d8ace2b4b4ce052" alt=""
* DOC: OrderDict.values() comparisons in Python 3 * Src: https://hg.python.org/cpython/file/f2a0a4a45292/Doc/library/collections.rst#... What should it say? ```rst .. `< https://hg.python.org/cpython/file/f2a0a4a45292/Doc/library/collections.rst#...
`__
collections.OrderedDict.values().__eq__ * "is suprising" * Python 3 has `dict views`_ * :class:`OrderedDict` matches the dict interface in Python 2.7 and Python 3. * https://docs.python.org/2/library/collections.html#collections.OrderedDict * https://docs.python.org/3/library/collections.html#collections.OrderedDict * Python 2 dict interface: * dict.viewkeys(), dict.viewvalues(), dict.viewitems() * dict.keys(), dict.values(), dict.items() * https://docs.python.org/2/library/stdtypes.html#dict * https://docs.python.org/2/library/stdtypes.html#dict.values * https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects * Python 3 dict interface (:ref:`dictionary view objects`: * dict.keys(), dict.values(), dict.items() * list(dict.keys()), list(dict.values()), list(dict.items()) * https://docs.python.org/3/library/stdtypes.html#dict * https://docs.python.org/3/library/stdtypes.html#dict.values * https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects * In order to compare OrderedDict.values() **by value** you must either: * Cast values() to a sequence (e.g. a list) before comparison * Subclass :class:`OrderedDict` and wrap `values()` .. code:: python from collections import OrderedDict a = 'a' x = 1 y = x ab = [( a, x), ('b', 2)] ba = [('b', 2), (a, y)] ab_odict = OrderedDict(ab) ab_odict_ = OrderedDict(ab) ab_odict_copy = OrderedDict(ab.copy()) ba_odict = OrderedDict(ba) ab_dict = dict(ab) ba_dict = dict(ba) # In Python 3, # OrderedDict.values.__eq__ does not compare by value: assert ( ab_odict.values() == ab_odict_.values()) is False assert (list(ab_odict.values()) == list(ab_odict_.values()) is True # In Python 2.7 and 3, # OrderedDict.__eq__ compares ordered sequences assert (ab_odict == ab_odict_) is True assert (ab_odict == ab_odict_copy) is True assert (ab_odict == ba_odict) is False assert (ab_dict == ba_dict) is True # - [ ] How to explain the x, y part? # - in terms of references, __eq__, id(obj), __hash__ ``` On Wed, Jan 20, 2016 at 12:39 PM, Sven R. Kunze <srkunze@mail.de> wrote:
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
So the issues are: 1. OrderedDict().values() does not implement __eq__. It uses object equality, which means identity. 1a. dict().values() does not implement __eq__. 2. OrderedDict().keys().__eq__ does not respect order. I'd argue that keys() should be ordered comparisons, and values() could be. As a plus, this could be more efficient than unordered comparisons, since it's just return all(x == y for x, y in zip(self.keys(), other.keys())) instead of packing each into a set and comparing the sets. But what would be the point of comparing values views? On the other hand, I guess dict().values().__eq__ should stay unimplemented. What would it mean? MultiSet comparison? Values in general aren't even hashable, so they can't be stuck into a hash-based set structure. Maybe explicitly make it NotImplemented. On Thu, Dec 17, 2015 at 4:40 AM, Alexandre Figura <ipipomme+python@gmail.com> wrote:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Dec 17, 2015, at 03:19, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
So the issues are:
I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.) As it stands, you can guess: * since all mapping keys and items act like sets (and are Sets), they probably compare like sets * since there is no OrderedSet for OrderedDict's keys and values to act like, they probably don't compare like those and instead just compare like sets * since values are just generic collections, they probably just use generic identity comparison. But, even if that guess may happen to be right for all builtin and stdlib types, and all user types that rely on the ABCs as mixins, in all of the major implementations, it's still just a guess, not something I'd want to rely on in portable code.
So what happens when you compare the keys, items, or values view from an OrderedDict against the view from another mapping type? Or, for keys and items, against another set type? If you leave that up to the whichever one is on the left, you get cases where a==b and b!=a. If you leave it up to the most derived type (by the usual __rspam__-like rules), that doesn't help anything, since dict_keys_view and odict_keys_view are unrelated except in sharing an abstract base class. And, worst of all, even if you contrive a way for one or the other to always win consistently, you get cases where a==b and b==c and a!=c. Also, overriding equality for ordered dict views might make sense, but what about ordering? I think it's pretty reasonable to expect that something that acts like a set and inherits from Set use < to mean subset, not lexicographical comparison. But then the interactions with other mapping views get even crazier. Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement. Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons? Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code?
I think you want zip_longest(self.keys(), other.keys(), fill=object()) or equivalent; otherwise {1, 2} will be equal to {1, 2, 3} . Also, you don't need to pack self.keys() into a set; it already is a Set, and implements all of the immutable set operations as-is, without needing any conversions. But really, who cares which of two implementations is more efficient when they implement two completely different things? If set comparison is right, it doesn't matter that sequence comparison is faster, you still can't use it, and vice-versa.
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Thu, Dec 17, 2015 at 8:57 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
I think the first issue is that, if the comparison behavior of dict views isn't documented anywhere, it probably should be. (Even if the intended rule is just "comparisons may not be meaningful or do different things in different implementations, don't use them for anything", that could be documented.)
The views are documented as "set-like" and tied to the Set ABC. [1] So comparision operations should match the same semantics as for sets. [snip]
Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons?
Yep. If you want to work with a sequence then you have to convert the OrderedDict to a list or other sequence. The same goes for the views, which *do* preserve order during iteration (e.g. "list(od.keys())").
Maybe what you really want is new methods to get sequence-like (but with O(1) __contains__ and friends) rather than set-like views, including implementing the Sequence ABC, which only exist on OrderedDict, compare like sequences, don't provide set operations or implement Set, etc. Then you can be explicit about which one you want. The question is, are you going to actually want the sequence-like views often enough for it to be worth adding all of that code?
I've spoken with folks that have use cases for OrderedDict-as-a-sequence, but they were always able to use something else that met their needs more directly anyway. -eric [1] https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Thu, Dec 17, 2015 at 11:57 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
* since all mapping keys and items act like sets (and are Sets), they probably compare like sets
.items() aren't like sets. Or something is very wrong.
If OrderedDict views raised NotImplemented, I believe the other view will then have the chance to try its own comparison.
Under the current rules, I'm pretty sure equality is always symmetric and transitive, ordering is consistent with the normal partial order rules, etc. New rules that seem more intuitive at first glance but break down as soon as you try to think them through don't seem like an improvement.
Finally, why should these comparisons be sequence-like? Yes, OrderedDict and its views do have a defined order, but they still don't act like sequences in other ways. You can't subscript or slice them, they follow dict rather than sequence rules for modifying during iteration (although I believe those rules aren't enforced in the code so you get arbitrary exceptions or wrong values instead of the RuntimeError from dict?), they fail isinstance(x, Sequence), etc. What other non-sequence types implement sequence comparisons?
OrderedDict itself does everything that you might not want its views to do. OrderedDict implements order-sensitive comparison. It breaks all the rules you can think of that order-sensitive comparisons can break. Transitivity is already a problem. "Who's comparing?" is already a problem, and it's made its choice ("Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries."). The question now is, should views be made consistent with the OrderedDict itself? There are three options: 1. Deprecate the OrderedDict ordered comparison, to make it consistent with the views. 2. Make the views consistent with the OrderedDict. 3. Do nothing.
I actually want return len(self.keys()) == len(other.keys()) and all(x == y for x, y in zip(self.keys(), other.keys())) This should be more efficient than set comparisons. It's a bonus, not a reason.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
So I think that not using the __eq__ method of the keys or values is wrong (dicts do use it), but there's a philosophical question: if two OrderedDicts have the same key/value pairs in a different order, should they be considered equal or not? (Only when we've answered this can we answer the question about keys/values/items). I'm not a frequent user of OrderedDict, so I don't have a good intuition, unfortunately. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
(Note: This is already the case. https://docs.python.org/3/library/collections.html#collections.OrderedDict """ Equality tests between OrderedDict objects are order-sensitive and are implemented as list(od1.items())==list(od2.items()). Equality tests between OrderedDict objects and other Mapping objects are order-insensitive like regular dictionaries. This allows OrderedDict objects to be substituted anywhere a regular dictionary is used. """ So you're asking whether to deprecate this behavior?) On Thu, Dec 17, 2015 at 2:55 PM, Guido van Rossum <guido@python.org> wrote:
data:image/s3,"s3://crabby-images/a2433/a2433046d4b9f61fe876ee9fd7ae3c7add11c90b" alt=""
I may or may not have gotten this right, but I think he's asking "should that be the case?", not "is that the case?" - It is the case, but should it? I believe so. For one of my projects, I have a more complex mapping type, which under the hood is just an ODict that I do various operations on, and sometimes delegate to - for example, for equality tests against another instance of the same class, I just compare the underlying ODs. In that last case, I most certainly don't want two non-equal classes to compare equal! I'll ask the obvious question instead: "Should two lists compare unequal if they have the same items but not the same order?" However, the answer you want is "If I care about insertion and iteration order, there's a good chance I also care about that order when comparing, too" If you explicitly don't care about the order, it's quite easy to do so:
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Thursday, December 17, 2015 11:50 AM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Yes they are. Look at the library documentation for dict and dict views in stdtypes, and in collections.abc. An items view is supposed to be set-like, and to be actually usable as a set if the values are hashable. (If the values aren't hashable, obviously most set operations are just going to raise.) And, as far as I can tell, nothing is very wrong.
Yes, but the proposal here is for OrderedDict and its views to implement something sequence-like, not to raise NotImplemented, so why is that relevant?
Which is something equivalent--the only way that can return different results is if self.values() contains elements that declare themselves to be equal to everything, at which point they already break too many invariants to safely put in a container.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Thu, Dec 17, 2015 at 4:34 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Hm. >>> x = {0: []} >>> y = {0: []} >>> x == y True >>> x.items() == y.items() True That last one doesn't seem set-like to me. But it seems I misunderstood what you were saying. Looking at the source code for ItemsView, "containment" is defined as "other[0] in self and self[other[0]] == other[1]". So yes, it's set-like, in that it checks for containment. I've just never thought about "containment of a key => value mapping". (It's funny, 'cause I've tried to develop the exact same idea in a different subject.)
I mean for them to raise NotImplemented in the case of "the other dict is not an instance of OrderedDict". Anyway, this is all a moot point. *If* they were to do something different from dict's views, then they should follow OrderedDict.__eq__. PS: To extend "is an OrderedDict a sequence?" in the wrong direction, I'll point out that OrderedDict.sort(key=None) has a clear and natural meaning, and the implementation should be obvious (relative to a given ODict implementation), for either a linkedlist or array ordering. And to go even further: OrderedDict.items().sort(key=None) also makes some sense. (Though why you would want to compare with respect to values...)
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Thursday, December 17, 2015 2:19 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Sure it's set-like
Yeah, that's the point: items views are effectively sets of items, which are key-value pairs. What else would they be sets of? Because values aren't necessarily hashable, it's unavoidable that some set operations may raise a TypeError (if the values aren't hashable) instead of doing what you'd want. But that doesn't mean all set operations that could possibly raise a TypeError always do so; that only happens when it's unavoidable. And here, it's avoidable. I suppose you could argue that this is a bit too clever to just assume without documenting, and another Python implementation might well have a dict items view that just directly tried hash(self) and raised here, so you really can't write any code that depends on this behavior. Maybe that's true. But that still doesn't seem like a problem with CPython's implementation; the question would just be how to change the docs (whether to require other implementations to do the same thing or to explicitly allow them to raise).
1. OrderedDict().values() does not implement __eq__. It uses
Sure; any other option is terrible. In fact, both of these options are terrible--one breaks consistency with other mappings, and with the basic rules of comparison; the other breaks consistency with the other related types. Ick. Then again, they both have compelling arguments for them, and they're both pretty simple. I doubt anyone has any critical code that relies on the current behavior, but then I doubt anyone would write any critical code that relied on the other behavior if it were changed. To me, it looks like the best deciding factor is inertia. If it's worked the same way for years, it might as well keep working that way. Maybe add a note in the docs saying you shouldn't compare the things, especially to other mapping types' views, and what will happen if you do?
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 02:38:53AM +0000, Andrew Barnert via Python-ideas wrote: [...]
This thread has wandered over a fair bit of ground, so I've lost track of *precisely* what these options are. I think we're still debating the fact that OrderedDict *values* compare by ID (like arbitrary objects), rather than by value like items and keys. For example: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = a.copy() py> a.keys() == b.keys() True py> a.items() == b.items() True So KeysView and ItemsView compare by the value of the view. But ValuesView compare by ID, not value: py> a.values() == b.values() False This makes no sense to me. The same counter-intuitive behaviour occurs for dict ValuesView as well: py> dict(a).values() == dict(b).values() False I think that if a and b are the same Mapping type (say, both dicts, or both OrderedDicts), then there is an obvious and logical invariant(s). The following three expressions should be equivalent: (1) a == b (2) a.items() == b.items() (3) (a.keys() == b.keys()) and (a.values() == b.values()) ignoring the usual suspects like NANs and other "funny stuff". Am I missing something? Is there a rationale for ValuesView to compare by identity instead of value? It's not just equality that behaves strangely with ValuesView. Even when the values are unique and hash-like, they don't behave very "set-like": # KeysView and ItemsView are set-like py> a.keys() & b.keys() {'b', 'a'} py> a.items() | b.items() {('b', 2), ('a', 1)} # values are hashable, but ValuesViews are not set-like py> set(a.values()) | set(b.values()) {1, 2} py> a.values() | b.values() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for |: 'ValuesView' and 'ValuesView' This is *especially* weird when one realises that ItemsView actually manages to be set-like even when the values are not hashable: py> c = odict(x=[]) py> c.items() & {} set() yet ValuesView isn't set-like when they are! -- Steve
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
ValuesView is not a set because there may be duplicates. But the identity thing feels odd. (Even though I designed this myself.) Maybe because values may not be comparable? --Guido (mobile)
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 08:37:09AM -0800, Guido van Rossum wrote:
Right, that makes sense now, and it's even documented that value views are not treated as sets: https://docs.python.org/2/library/stdtypes.html#dictionary-view-objects I'm not sure what you mean by "values may not be comparable"? Since we're only talking about equality, aren't all values comparable? -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 9:34 PM, Steven D'Aprano <steve@pearwood.info> wrote:
See my example in the other email (https://mail.python.org/pipermail/python-ideas/2015-December/037498.html). That's a case where the order of comparison matters, so you can't do a conceptual "unordered comparison" without, in the worst case, comparing everything to everything else. This is due to custom __eq__ (by OrderedDict, for irony's sake): a == b and b == c does not mean a == c.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 10:28:52PM -0500, Franklin? Lee wrote:
I don't know what Guido means by "values might not be comparable", but your example is lack of transitivity. Mathematical equality is transitive: if a == b, and b == c, then a == c. But that doesn't extend to non-numeric concepts of equality, e.g. preference ordering, or other forms of ranking. Since Python __eq__ can be overridden, we cannot assume that equality of arbitrary objects is necessarily transitive. And indeed, even with Mappings they are not: py> from collections import OrderedDict as odict py> a = odict([('a', 1), ('b', 2)]) py> b = dict(a) py> c = odict([('b', 2), ('a', 1)]) py> a == b == c True py> a == c False -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 10:57 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Well, that's my point AND my example. If this were lists, then a lack of transitivity of elements would mean a lack of transitivity for lists of those elements. You get what you put in. But since dict views are unordered collections, a lack of transitivity for elements would mean INCONSISTENCY: the comparison of two views as multisets would depend on the exact order of comparison. Unless you are willing to compare everything to everything else in the worst case.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 8:07 PM, Franklin? Lee < leewangzhong+python@gmail.com> wrote:
Honestly, I think it's fine if an operation like == for a collection uses an algorithm that just assumes the items' comparison is transitive, and if it isn't, you still get what you put it (i.e. it's still like a sewer :-). It's the same with sort() -- if the comparison is unreasonable the sort still terminates, just not with the items in sorted order. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 7:57 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I don't know what Guido means by "values might not be comparable", but your example is lack of transitivity.
I mean cases where == actually raises an exception. In Python 2 we did this for comparing certain 8-bit strings to unicode strings. While we've fixed that particular issue (in 3, bytes(...) == str(...) always returns False), it's still possible, and in fact even some stdlib modules do this -- I know certain comparisons of naive and tz-aware datetimes do this (which is not the same as returning NotImplemented). However for full disclosure I should add that until just now I had misunderstood the complaint about values() -- it doesn't compare the values by identity, values views themselves are only compared by identity. But (how's this for a nice recovery :-) that's how comparing dict.values() works, and it's reasonable given how expensive it would be to make it work otherwise (the corresponding ABCs behave the same way). The real oddity is that an OrderedDict's keys and items views don't take order into account, even though comparing the OrderedDict objects themselves does use the order. This seems to be laziness of implementation (just inheriting most of the view implementation from dict) rather than based on some careful design consideration. Unless I missed the consideration (I wasn't involved in the design of OrderedDict at all TBH). (And FWIW if we do fix this I might be amenable to changing the way values views are compared, but *not* by taking the keys into account. It's either by identity of the view object, or by comparing the values in order.) -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Steven D'Aprano writes:
I don't know what Guido means by "values might not be comparable",
Guido evidently meant something more pragmatic, but as I see it, values of different types are in principle incomparable, and the odict == dict == odict example shows why: satisfying the equality definitions of different types simultaneously is often impossible. But if you transform them to a common type or unify them in a union type, I think it's reasonable to expect that the common type will implement equality as an equivalence relation. (As you indirectly mentioned yourself, that's why we accept objects like NaN only as a last resort in preserving compatibility with a truly important standard we can't change.) In Python, the TOOWTDI common type for fallback is object, and that works well enough as a default implementation of __eq__ (ie, "is") that allows for equality comparison across types, which can be useful.
but your example is lack of transitivity.
Indifference (the notion of "equality" that applies to preference, at least in economics) in practice is always an equivalence; all use cases I know of require this (many strengthen it to actual equality, ie, requiring antisymmetry). I can't think of a case where I would consider any equality relation deliberately defined as not an equivalence to be anything but perverse. So I'm not sure what you're trying to say here. I guess you mean that as a pragmatic matter, a programming language may allow perverse definitions, and of course there may be artifacts of particular definitions such that objects of types that should always be considered unequal might compare equal. I suppose there are "consenting adults" cases where it's convenient for a particular use case to use a perverse definition. And yes, I consider "fall back to the less restrictive definition of equality", as done by OrderedDict and dict in the example, to be perverse.
data:image/s3,"s3://crabby-images/f3aca/f3aca73bf3f35ba204b73202269569bd49cd2b1e" alt=""
On Fri, Dec 18, 2015 at 3:07 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This makes no sense to me. The same counter-intuitive behaviour occurs for dict ValuesView as well:
OrderedDict re-uses dict's view types (but overrides __iter__). So it had better be the same behavior! :) -eric
data:image/s3,"s3://crabby-images/d224a/d224ab3da731972caafa44e7a54f4f72b0b77e81" alt=""
On Dec 18, 2015, at 03:07, Steven D'Aprano <steve@pearwood.info> wrote:
It's not just equality that behaves strangely with ValuesView. Even when the values are unique and hash-like, they don't behave very "set-like":
But values views are inherently multisets, not sets. Do you really want to say that the multisets {2, 3, 3} and {2, 2, 3} are equal because they have the same elements, even though they have different counts? And is {1, 1, 2} & {1, 1, 3} really just {1} rather than {1, 1}? For some uses of multisets, those rules make sense, but in general, they don't. (Check out how Counter handles the same questions.) If we had a general notion of multisets in Python, it might make sense to define values views and their behavior in terms of multisets. But defining them in terms of sets because that's sort of close and we have them doesn't make any sense. If you're thinking we could define what multisets should do, despite not having a standard multiset type or an ABC for them, and apply that to values views, the next question is how to do that in better than quadratic time for non-hashable values. (And you can't assume ordering here, either.) Would having a values view hang for 30 seconds and then come back with the answer you intuitively wanted instead of giving the wrong answer in 20 millis be an improvement? (Either way, you're going to learn the same lesson: don't compare values views. I'd rather learn that in 20 millis.)
Try this: c = {1: []} d = {1: [], 2: []} c.items() < d.items() It can tell that one "virtual set" of key-value pairs is a subset even though they can't actually be represented as sets. The items views take "as set-like as possible" very seriously.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 5:03 PM, Sven R. Kunze <srkunze@mail.de> wrote:
Counter would require hashable values. Any efficient multibag concept, in fact, would. Quadratic multibag comparisons would run into trouble with custom equality. # Pretending that kwargs is ordered. a0 = dict(x=0, y=1) a1 = a0 b0 = OrderedDict(x=0, y=1) b1 = OrderedDict(y=1, x=0) d0 = {'foo': a0, 'bar': b0} d1 = {'foo': b1, 'bar': a1} If we compare a0 == a1 and b0 == b1, then it fails. If we compare a0 == b1 and b0 == a1, then it passes. The order of comparisons matter. I see two options: - comparison is explicitly NotImplemented. Any code that used it should've used `is`. - comparison respects keys. OrderedDict values() comparison makes some sense, but its options would be - comparison is sequential. - comparison respects keys.
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 05:59:39PM -0500, Franklin? Lee wrote:
We're still talking about equality between Mapping.values() views, correct? I strongly dislike that option. I don't know of any std lib or built-in object which tries to prohibit equality comparisons. Of course your own custom classes can do anything they like, including rather silly things: class Silly: def __eq__(self, other): x = random.choice([0, 1, 2]) if x == 2: raise ValueError return bool(x) but I think that people expect that equality tests should always succeed, even if it falls back on the default object behaviour (namely identity comparison).
- comparison respects keys.
I'm not sure I understand what this means. If we have two dicts: a = {1: None, 2: None} b = {1: None, 3: None} are you suggesting that `a.values() == b.values()` should return False because the KEYS {1, 2} and {1, 3} are different? "ValuesViews implement equality by comparing identity, because everything else is too hard" might not be useful, but at least it makes sense as an explanation. Whereas "ValuesViews implement equality by comparing keys" sounds like something I might have read in "PHP, A Fractal Of Bad Design" :-) -- Steve
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Fri, Dec 18, 2015 at 10:43:24PM -0500, Franklin? Lee wrote:
Franklin, could you please try to be a little bit more explicit about what "it" and "this" is when you describe something? I find it very hard to understand what *specific* thing you are referring to when you refer to it using short-hand. You say that "this" is a silent error waiting to happen. Does "this" refer to your suggestion that "comparison is explicitly NotImplemented", or something else? I can't tell.
Second, "NotImplemented" allows the other side to try its __eq__.
Okay, that makes better sense. I tought you meant that Mapping.ValuesView *did not implement* an __eq__ method, so that it raised an exception if you tried to compare them: # What I thought you meant a.values() == b.values() => raises an exception Now I understand that you mean they should return NotImplemented instead. But in practice, that doesn't actually change the behaviour that much: if both sides return NotImplemented, Python will fall back to the default behaviour, which is identity. py> class A: ... def __eq__(self, other): ... return NotImplemented ... py> a = A() py> b = A() py> a == a True py> a == b False Returning NotImplemented just allows the other argument a chance to be called, but that already happens! py> class Spam: ... def __eq__(self, other): ... print("calling Spam.__eq__") ... return True ... py> a = odict() py> b = Spam() py> a == b calling Spam.__eq__ True py> b == a calling Spam.__eq__ True So no change there. We already have that behaviour. -- Steve
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 11:10 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Sorry, bad habit. I see dict().values().__eq__ as an error.
I didn't realize this. Since explicitly raising NotImplemented would cause it NOT to try the reflected method, I withdraw the suggestion of making the comparison fail in any way that it doesn't already.
data:image/s3,"s3://crabby-images/43e44/43e443bff8896e014342cd59dad62b7ff5592a3d" alt=""
On Fri, Dec 18, 2015 at 11:58 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
Sorry, bad habit. I see dict().values().__eq__ as an error.
I mean that I see comparison of dict values views as an error. That it shouldn't have been done. But I can see where it would be used. So I withdraw that, too.
data:image/s3,"s3://crabby-images/b96f7/b96f788b988da8930539f76bf56bada135c1ba88" alt=""
Andrew Barnert via Python-ideas writes:
I don't think this is an appropriate argument. I don't check every computation my Python programs do. There's a good chance it would take years to learn not to compare values views because it's sometimes wrong (eg, if in my use case most of the time view_a is view_b or not equal according to my desired definition). OTOH, I do check my watch many times a day, and am very protective of my time; I would notice a 30-second hang in that case. Speaking of 30-second hangs, I really wish you'd trim (the alternative is to stop reading your posts, and I really don't want to do that either). If you won't trim, please top-post, at least for those posts which have S/N ratios on the order of 10^-2 (which have been frequent recently).
data:image/s3,"s3://crabby-images/980d1/980d1e4a110b86a06fe535e4d8377768d2e2398b" alt=""
Andrew Barnert writes:
Why aren't all values hashable? When I think about it, it's a bit strange that "hashable" is so wrapped up in immutability for no other reason than the idea that it's not safe to use a mutable object as a dictionary key if it is changed while it is in the dictionary. Java doesn't have this restriction. And while the reasoning is certainly defensible in isolation, it sort of goes against the "consenting adults" principle that is used to justify all the _other_ dangerous/questionable things that Python doesn't bother putting technical obstacles in front of. Why not demand that all objects (except NaN?) can be hashed, and that their hash shall match the equality relationship they define, and that all objects can safely be used as set members and dictionary keys so long as they are not in fact changed while in such a position? Certainly we can't put technical restrictions in the way of defining a __hash__ method that raises an exception, but we can say that if they try to use such an object in a dictionary and compare their values views they're on their own. For a somewhat more conservative path forward, define a new method __vhash__ which will always return a value (by default based on the object identity), and __hash__ shall return either the same number as __vhash__, or raise an exception if the object is not guaranteed as "immutable" for the purpose of equality comparison.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Fri, Dec 18, 2015 at 10:22 PM, Random832 <random832@fastmail.com> wrote:
The link between hashing and immutability is because objects whose hash would change are common, e.g. lists, and using them as dict keys would be very hard to debug for users most likely to make this mistake. The issue is that the dict implementation makes it impossible to find back keys whose hash has changed, other than by linear search, which is unacceptable -- but that's exactly what users will try to debug such issues, i.e., print the dict and notice that the missing key is indeed present. The consenting adults rule typically applies to things that are well hidden or marked (e.g. using __dunder__ names). There are plenty of things that Python could allow but doesn't, not because they are hard to implement or would violate an invariant of the interpreter, but because they could trip over naive users. Note that you are turning things upside down: the question "why aren't all things hashable" came about because Andrew was considering making a hash table of the values of a dict. But the real question here isn't "why aren't all things hashable" but "why can't you put mutable values into a set". The answer to the latter is because when we put a value into a container, and later the value changes, we can't tell the container, so if the container has any sort of lookup scheme other than linear search, it would lose track of the value. Hashing comes into play because all of Python's common data structures use hashing to optimize lookup -- but if we used a different data structure, e.g. something based on sorting the keys, we'd still have the mutability problem. And we'd have worse problems, because values would have to be sortable, which is a stricter condition than being immutable. In any case, you can't solve this problem by making all values hashable. -- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/f576b/f576b43f4d61067f7f8aeb439fbe2fadf3a357c6" alt=""
Guido van Rossum <guido@python.org> writes:
That was a great explanation; you answered several points on which I was vague, and you addressed some things I didn't even know were problems. I'd love to see that edited to a blog post we can reference in a single article, if you have the time. -- \ “I went to the museum where they had all the heads and arms | `\ from the statues that are in all the other museums.” —Steven | _o__) Wright | Ben Finney
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
Hah, I never have time to blog any more. :-( However you can just link to the mailman archives if you want to reference it. On Sat, Dec 19, 2015 at 12:16 AM, Ben Finney <ben+python@benfinney.id.au> wrote:
-- --Guido van Rossum (python.org/~guido)
data:image/s3,"s3://crabby-images/980d1/980d1e4a110b86a06fe535e4d8377768d2e2398b" alt=""
Guido van Rossum writes:
Java doesn't seem to have this problem. Python uses dicts more heavily as part of its core architecture, sure, but those dicts use strings as their keys.
The consenting adults rule typically applies to things that are well hidden or marked (e.g. using __dunder__ names).
The ability to e.g. replace a class or module's functions, or values intended as constants, is not especially well-hidden.
Well, sure, but that's a reasonable way (if the ability to do so were present) to implement the operation being discussed under the performance constraints he specified.
Yes, but you're fine as long as the value doesn't change. What do you think about my __vhash__ idea? Someone would only make sets/dicts that use __vhash__ rather than __hash__ if they can guarantee the object won't change in the lifetime of its presence in the container (something that's no problem for the short-lived container that would be used for this operation)
Sure I can. Normal dict values: def __eq__(self, b): return Counter(self) == Counter(b) #or e.g. Counter(map(self, make_vhash_key)) ... OrderedDict values: def __eq__(self, b): if isinstance(b, OrderedDict) return List(self) == List(b) else: return super().__eq__(b) # Yes, this isn't transitive, i.e. maybe: # a == c and b == c where a != b # but the same is true today for the dicts.
data:image/s3,"s3://crabby-images/3c3b2/3c3b2a6eec514cc32680936fa4e74059574d2631" alt=""
On Sat, Dec 19, 2015 at 5:46 AM, Random832 <random832@fastmail.com> wrote:
You can solve this without adding warts to the language by using a wrapper object.
I don't see what that bit of code proves (or most other things you wrote above). -- --Guido van Rossum (python.org/~guido)
participants (12)
-
Alexandre Figura
-
Andrew Barnert
-
Ben Finney
-
Emanuel Barry
-
Eric Snow
-
Franklin? Lee
-
Guido van Rossum
-
Random832
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Wes Turner