Re: Are PyObject_RichCompareBool shortcuts part of Python or just CPython quirks?
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are the same object, then equality comparison returns True and inequality False. No attempt is made to execute __eq__ or __ne__ methods in those cases.
This has visible consequences all over the place, but they don't appear to be documented. For example,
... despite that math.nan == math.nan is False.
It's usually clear which methods will be called, and when, but not really here. Any _context_ that calls PyObject_RichCompareBool() under the covers, for an equality or inequality test, may or may not invoke __eq__ or __ne__, depending on whether the comparands are the same object. Also any context that inlines these special cases to avoid the overhead of calling PyObject_RichCompareBool() at all.
If it's intended that Python-the-language requires this, that needs to be documented.
This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well. For example, Sequence.__contains__() is defined as: def __contains__(self, value): for v in self: if v is value or v == value: # note the identity test return True return False Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as: def pop(self): """Return the popped value. Raise KeyError if empty.""" it = iter(self) try: value = next(it) except StopIteration: raise KeyError from None self.discard(value) return value That pop() logic implicitly assumes an invariant between membership and iteration: assert(x in collection for x in collection) We really don't want to pop() a value *x* and then find that *x* is still in the container. This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself: s = {float('NaN')} s.pop() assert not s # Do we want the language to guarantee that s is now empty? I think we must. The code for clear() depends on pop() working: def clear(self): """This is slow (creates N new iterators!) but effective.""" try: while True: self.pop() except KeyError: pass It would unfortunate if clear() could not guarantee a post-condition that the container is empty: s = {float('NaN')} s.clear() assert not s # Can this be allowed to fail? The case of count() is less clear-cut, but even there identity-implies-equality improves our ability to reason about code: Given some list, *s*, possibly already populated, would you want the following code to always work: c = s.count(x) s.append(x) assert s.count(x) == c + 1 # To me, this is fundamental to what the word "count" means. I can't find it now, but remember a possibly related discussion where we collectively rejected a proposal for an __is__() method. IIRC, the reasoning was that our ability to think about code correctly depended on this being true: a = b assert a is b Back to the discussion at hand, I had thought our position was roughly: * __eq__ can return anything it wants. * Containers are allowed but not required to assume that identity-implies-equality. * Python's core containers make that assumption so that we can keep the containers internally consistent and so that we can reason about the results of operations. Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value". As far as NaNs go, the only question is how far to propagate their notion of irreflexivity. Should "x == x" return False for them? We've decided yes. When it comes to containers, who makes the rules, the containers or their elements. Mostly, we let the elements rule, but containers are allowed to make useful assumptions about the elements when necessary. This isn't much different than the rules for the "==" operator where __eq__() can return whatever it wants, but functions are still allowed to write "if x == y: ..." and assumes that meaningful boolean value has been returned (even if it wasn't). Likewise, the rule for "<" is that it can return whatever it wants, but sorted() and min() are allowed to assume a meaningful total ordering (which might or might not be true). In other words, containers and functions are allowed, when necessary or useful, to override the decisions made by their data. This seems like a reasonable state of affairs. The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons Sorry for the lack of concision. I'm posting on borrowed time, Raymond
We could introduce parallel kinds of collections: ValueList/IdentityList, ValueDict/IdentityDict, etc. Ones would use comparison by value and do not preserve identity (so we could use more efficient storage for homogeneous collections, for example a list of small ints could spend 1 byte/item). And others would use comparison by identity. IdentityDict was already discussed before. There is a demand on this feature, but it is not large if keep backward compatibility. There is a workaround (a dict of id(key) to a tuple of (key, value)), which is not compatible with IdentityDict, so the latter can be a replacement in a public API.
+1 on everything Raymond says here (and in his second message).
I don't see a need for more classes or ABCs.
On Mon, Feb 3, 2020 at 00:36 Raymond Hettinger
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are the same object, then equality comparison returns True and inequality False. No attempt is made to execute __eq__ or __ne__ methods in those cases.
This has visible consequences all over the place, but they don't appear to be documented. For example,
... despite that math.nan == math.nan is False.
It's usually clear which methods will be called, and when, but not really here. Any _context_ that calls PyObject_RichCompareBool() under the covers, for an equality or inequality test, may or may not invoke __eq__ or __ne__, depending on whether the comparands are the same object. Also any context that inlines these special cases to avoid the overhead of calling PyObject_RichCompareBool() at all.
If it's intended that Python-the-language requires this, that needs to be documented.
This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well. For example, Sequence.__contains__() is defined as:
def __contains__(self, value): for v in self: if v is value or v == value: # note the identity test return True return False
Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as:
def pop(self): """Return the popped value. Raise KeyError if empty.""" it = iter(self) try: value = next(it) except StopIteration: raise KeyError from None self.discard(value) return value
That pop() logic implicitly assumes an invariant between membership and iteration:
assert(x in collection for x in collection)
We really don't want to pop() a value *x* and then find that *x* is still in the container. This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself:
s = {float('NaN')} s.pop() assert not s # Do we want the language to guarantee that s is now empty? I think we must.
The code for clear() depends on pop() working:
def clear(self): """This is slow (creates N new iterators!) but effective.""" try: while True: self.pop() except KeyError: pass
It would unfortunate if clear() could not guarantee a post-condition that the container is empty:
s = {float('NaN')} s.clear() assert not s # Can this be allowed to fail?
The case of count() is less clear-cut, but even there identity-implies-equality improves our ability to reason about code: Given some list, *s*, possibly already populated, would you want the following code to always work:
c = s.count(x) s.append(x) assert s.count(x) == c + 1 # To me, this is fundamental to what the word "count" means.
I can't find it now, but remember a possibly related discussion where we collectively rejected a proposal for an __is__() method. IIRC, the reasoning was that our ability to think about code correctly depended on this being true:
a = b assert a is b
Back to the discussion at hand, I had thought our position was roughly:
* __eq__ can return anything it wants.
* Containers are allowed but not required to assume that identity-implies-equality.
* Python's core containers make that assumption so that we can keep the containers internally consistent and so that we can reason about the results of operations.
Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value".
As far as NaNs go, the only question is how far to propagate their notion of irreflexivity. Should "x == x" return False for them? We've decided yes. When it comes to containers, who makes the rules, the containers or their elements. Mostly, we let the elements rule, but containers are allowed to make useful assumptions about the elements when necessary. This isn't much different than the rules for the "==" operator where __eq__() can return whatever it wants, but functions are still allowed to write "if x == y: ..." and assumes that meaningful boolean value has been returned (even if it wasn't). Likewise, the rule for "<" is that it can return whatever it wants, but sorted() and min() are allowed to assume a meaningful total ordering (which might or might not be true). In other words, containers and functions are allowed, when necessary or useful, to override the decisions made by their data. This seems like a reasonable state of affairs.
The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons
Sorry for the lack of concision. I'm posting on borrowed time,
Raymond
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/UIZPD7OJ... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
[Tim]
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are the same object, then equality comparison returns True and inequality False. No attempt is made to execute __eq__ or __ne__ methods in those cases. ... If it's intended that Python-the-language requires this, that needs to be documented.
[Raymond]
This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well. For example, Sequence.__contains__() is defined as:
def __contains__(self, value): for v in self: if v is value or v == value: # note the identity test return True return False
But it's unclear to me whether that's intended to constrain all implementations, or is just mimicking CPython's list.__contains__. That's always a problem with operational definitions. For example, does it also constrain all implementations to check in iteration order? The order can be visible, e.g, in the number of times v.__eq__ is called.
Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as:
def pop(self): """Return the popped value. Raise KeyError if empty.""" it = iter(self) try: value = next(it) except StopIteration: raise KeyError from None self.discard(value) return value
As above, except CPyhon's own set implementation implementation doesn't faithfully conform to that:
x = set(range(0, 10, 2)) next(iter(x)) 0 x.pop() # returns first in iteration order 0 x.add(1) next(iter(x)) 1 x.pop() # ditto 1 x.add(1) # but try it again! next(iter(x)) 1 x.pop() # oops! didn't pop the first in iteration order 2
Not that I care ;-) Just emphasizing that it's tricky to say no more (or less) than what's intended.
That pop() logic implicitly assumes an invariant between membership and iteration:
assert(x in collection for x in collection)
Missing an "all".
We really don't want to pop() a value *x* and then find that *x* is still in the container. This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself:
Speaking of which, why is "discard()" called instead of "remove()"? It's sending a mixed message: discard() is appropriate when you're _not_ sure the object being removed is present.
s = {float('NaN')} s.pop() assert not s # Do we want the language to guarantee that # s is now empty? I think we must.
I can't imagine an actual container implementation that wouldn't. but no actual container implements pop() in the odd way MutableSet.pop() is written. CPython's set.pop does nothing of the sort - doesn't even have a pointer equality test (except against C's NULL and `dummy`, used merely to find "the first (starting at the search finger)" slot actually in use). In a world where we decided that the identity shortcut is _not_ guaranteed by the language, the real consequence would be that the MutableSet.pop() implementation would need to be changed (or made NotImplemented, or documented as being specific to CPython).
The code for clear() depends on pop() working:
def clear(self): """This is slow (creates N new iterators!) but effective.""" try: while True: self.pop() except KeyError: pass
It would unfortunate if clear() could not guarantee a post-condition that the container is empty:
That's again a consequence of how MutableSet.pop was written. No actual container has any problem implementing clear() without needing any kind of object comparison.
s = {float('NaN')} s.clear() assert not s # Can this be allowed to fail?
No, but as above it's a very far stretch to say that clear() emptying a container _relies_ on the object identity shortcut. That's a just a consequence of an odd specific clear() implementation, relying in turn on an odd specific pop() implementation that assumes the shortcut is in place.
The case of count() is less clear-cut, but even there identity-implies-equality improves our ability to reason about code:
Absolutely! That "x is x implies equality" is very useful. But that's not the question ;-)
Given some list, *s*, possibly already populated, would you want the following code to always work:
c = s.count(x) s.append(x) assert s.count(x) == c + 1 # To me, this is fundamental to what the word "count" means.
I would, yes. But it's also possible to define s.count(x) as sum(x == y for y in s) and live with the consequences of __eq__.
...
Back to the discussion at hand, I had thought our position was roughly:
* __eq__ can return anything it wants.
* Containers are allowed but not required to assume that identity-implies-equality.
* Python's core containers make that assumption so that we can keep the containers internally consistent and so that we can reason about the results of operations.
All reasonable! Python just needs something now like a benevolent dictator ;-)
Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value".
Memory fades, but it seems to me that very early Pythons may even have exploited the shortcut for `==` too.
... The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons
Yes, that's been pointed out, and it's at worst "a good start". The people on the original PR that kicked this off weren't aware of that it existed. Terry Reedy said he's thinking about how to (at least) make it more discoverable, although at that time Guido appeared to be leaning "implementation defined" instead. [in another msg]
forget to mention that list.index() also uses PyObject_RichCompareBool()
A quick scan found about 100 calls to PyObject_RichCompareBool passing Py_EQ. So it screams for a way to spell out what's required that doesn't degenerate into an exhaustive list of specific functions/methods/contexts.
Now, probably this has been rejected a hundred times before, and there are some very good reason why it is a horrible thought... But if `PyObject_RichCompareBool(..., Py_EQ)` is such a fundamental operation (and in a sense it seems to me that it is), is there a point in explicitly defining it? That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation. That operator would obviously be allowed to use the shortcut. At that point container `==` and `in` (and equivalence) is defined based on element equivalence. NAs (missing value handling) may be an actual use-case where it is more than a theoretical thought. However, I do not seriously work with NAs myself. - Sebastian On Mon, 2020-02-03 at 16:00 -0600, Tim Peters wrote:
[Tim]
PyObject_RichCompareBool(x, y, op) has a (valuable!) shortcut: if x and y are the same object, then equality comparison returns True and inequality False. No attempt is made to execute __eq__ or __ne__ methods in those cases. ... If it's intended that Python-the-language requires this, that needs to be documented.
[Raymond]
This has been slowly, but perhaps incompletely documented over the years and has become baked in the some of the collections ABCs as well. For example, Sequence.__contains__() is defined as:
def __contains__(self, value): for v in self: if v is value or v == value: # note the identity test return True return False
But it's unclear to me whether that's intended to constrain all implementations, or is just mimicking CPython's list.__contains__. That's always a problem with operational definitions. For example, does it also constrain all implementations to check in iteration order? The order can be visible, e.g, in the number of times v.__eq__ is called.
Various collections need to assume reflexivity, not just for speed, but so that we can reason about them and so that they can maintain internal consistency. For example, MutableSet defines pop() as:
def pop(self): """Return the popped value. Raise KeyError if empty.""" it = iter(self) try: value = next(it) except StopIteration: raise KeyError from None self.discard(value) return value
As above, except CPyhon's own set implementation implementation doesn't faithfully conform to that:
x = set(range(0, 10, 2)) next(iter(x)) 0 x.pop() # returns first in iteration order 0 x.add(1) next(iter(x)) 1 x.pop() # ditto 1 x.add(1) # but try it again! next(iter(x)) 1 x.pop() # oops! didn't pop the first in iteration order 2
Not that I care ;-) Just emphasizing that it's tricky to say no more (or less) than what's intended.
That pop() logic implicitly assumes an invariant between membership and iteration:
assert(x in collection for x in collection)
Missing an "all".
We really don't want to pop() a value *x* and then find that *x* is still in the container. This would happen if iter() found the *x*, but discard() couldn't find the object because the object can't or won't recognize itself:
Speaking of which, why is "discard()" called instead of "remove()"? It's sending a mixed message: discard() is appropriate when you're _not_ sure the object being removed is present.
s = {float('NaN')} s.pop() assert not s # Do we want the language to guarantee that # s is now empty? I think we must.
I can't imagine an actual container implementation that wouldn't. but no actual container implements pop() in the odd way MutableSet.pop() is written. CPython's set.pop does nothing of the sort - doesn't even have a pointer equality test (except against C's NULL and `dummy`, used merely to find "the first (starting at the search finger)" slot actually in use).
In a world where we decided that the identity shortcut is _not_ guaranteed by the language, the real consequence would be that the MutableSet.pop() implementation would need to be changed (or made NotImplemented, or documented as being specific to CPython).
The code for clear() depends on pop() working:
def clear(self): """This is slow (creates N new iterators!) but effective.""" try: while True: self.pop() except KeyError: pass
It would unfortunate if clear() could not guarantee a post- condition that the container is empty:
That's again a consequence of how MutableSet.pop was written. No actual container has any problem implementing clear() without needing any kind of object comparison.
s = {float('NaN')} s.clear() assert not s # Can this be allowed to fail?
No, but as above it's a very far stretch to say that clear() emptying a container _relies_ on the object identity shortcut. That's a just a consequence of an odd specific clear() implementation, relying in turn on an odd specific pop() implementation that assumes the shortcut is in place.
The case of count() is less clear-cut, but even there identity- implies-equality improves our ability to reason about code:
Absolutely! That "x is x implies equality" is very useful. But that's not the question ;-)
Given some list, *s*, possibly already populated, would you want the following code to always work:
c = s.count(x) s.append(x) assert s.count(x) == c + 1 # To me, this is fundamental to what the word "count" means.
I would, yes. But it's also possible to define s.count(x) as
sum(x == y for y in s)
and live with the consequences of __eq__.
... Back to the discussion at hand, I had thought our position was roughly:
* __eq__ can return anything it wants.
* Containers are allowed but not required to assume that identity- implies-equality.
* Python's core containers make that assumption so that we can keep the containers internally consistent and so that we can reason about the results of operations.
All reasonable! Python just needs something now like a benevolent dictator ;-)
Also, I believe that even very early dict code (at least as far back as Py 1.5.2) had logic for "v is value or v == value".
Memory fades, but it seems to me that very early Pythons may even have exploited the shortcut for `==` too.
... The current docs make an effort to describe what we have now: https://docs.python.org/3/reference/expressions.html#value-comparisons
Yes, that's been pointed out, and it's at worst "a good start". The people on the original PR that kicked this off weren't aware of that it existed. Terry Reedy said he's thinking about how to (at least) make it more discoverable, although at that time Guido appeared to be leaning "implementation defined" instead.
[in another msg]
forget to mention that list.index() also uses PyObject_RichCompareBool()
A quick scan found about 100 calls to PyObject_RichCompareBool passing Py_EQ. So it screams for a way to spell out what's required that doesn't degenerate into an exhaustive list of specific functions/methods/contexts. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/44XXRXK2... Code of Conduct: http://python.org/psf/codeofconduct/
On 2/3/20 3:07 PM, Sebastian Berg wrote:
That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation.
You misunderstand what's going on here. Python deliberately makes float('NaN') != float('NaN'), and in fact there's special code to ensure that behavior. Why? Because it's mandated by the IEEE 754 floating-point standard. https://en.wikipedia.org/wiki/NaN#Comparison_with_NaN This bizarre behavior is often exploited by people exploring the murkier corners of Python's behavior. Changing it is (sadly) not viable. //arry/
On Mon, 2020-02-03 at 16:43 -0800, Larry Hastings wrote:
On 2/3/20 3:07 PM, Sebastian Berg wrote:
That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation.
You misunderstand what's going on here. Python deliberately makes float('NaN') != float('NaN'), and in fact there's special code to ensure that behavior. Why? Because it's mandated by the IEEE 754 floating-point standard.
This bizarre behavior is often exploited by people exploring the murkier corners of Python's behavior. Changing it is (sadly) not viable.
Of course it is not, I am not saying that it should be changed. What I mainly meant is that in this discussion there was always the talk about two distinct, slightly different operations: 1. `==` has of course the logic `NaN == NaN -> False` 2. `PyObject_RichCompareBool(a, b, Py_EQ)` was argued to have a useful logic of `a is b or a == b`. And I argued that you could define: def operator.identical(a, b): res = a is b or a == b assert type(res) is bool # arrays have unclear logic return res to "bless" it as its own desired logic when dealing with containers (mainly). And that making that distinction on the language level would be a(possibly ugly) resolution of the problem. Only `identical` is actually always allowed to use the `is` shortcut. Now, for all practical purposes "identical" is maybe already correctly defined by `a is b or bool(a == b)` (NaN being the largest inconsistency, since NaN is not a singleton). Along that line, I could argue that `PyObject_RichCompareBool` is actually incorrectly implemented and it should be replaced with a new `PyObject_Identical` in most places where it is used. Once you get to the point where you accept the existance of `identical` as a distinct operation, allowing `identical(NaN, NaN)` to be always true *can* make sense, and resolves current inconsistencies w.r.t. containers and NaNs. - Sebastian
/arry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/GOJNWAJS... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Feb 03, 2020 at 05:26:38PM -0800, Sebastian Berg wrote:
1. `==` has of course the logic `NaN == NaN -> False` 2. `PyObject_RichCompareBool(a, b, Py_EQ)` was argued to have a useful logic of `a is b or a == b`. And I argued that you could define:
def operator.identical(a, b): res = a is b or a == b assert type(res) is bool # arrays have unclear logic return res
to "bless" it as its own desired logic when dealing with containers (mainly).
Note that Python arrays define equality similarly to other containers: py> from array import array py> array('i', [1, 2, 3]) == array('i', [2, 3, 1]) False It is numpy arrays which do something unusual with equality. (And I would argue that they are wrong to do so. But that ship has long sailed over the horizon.)
Only `identical` is actually always allowed to use the `is` shortcut.
You can't enforce that (and why would you want to?). If I want to use an `is` shortcut in my `__eq__` methods, or write out the condition in full, who are you to say that's forbidden unless I call `identical`?
Now, for all practical purposes "identical" is maybe already correctly defined by `a is b or bool(a == b)` (NaN being the largest inconsistency, since NaN is not a singleton). Along that line, I could argue that `PyObject_RichCompareBool` is actually incorrectly implemented and it should be replaced with a new `PyObject_Identical` in most places where it is used.
In what way is PyObject_RichCompareBool incorrect? Can you point to a bug caused by this incorrect implementation?
Once you get to the point where you accept the existance of `identical` as a distinct operation, allowing `identical(NaN, NaN)` to be always true *can* make sense
We already have `identical` in the language, it is the `is` operator. Your "identical" function is misnamed, it should be "identical_or_equal". If you want to argue that "identical or equal" is such a fundamental and important operation in Python code that we ought to offer it ready-made in the operator module, I'm listening. But my gut feeling here is to say "not every one line expression needs to be in the stdlib". PyObject_RichCompareBool is a different story. "Identical or equal" is not so simple to implement correctly in C code, and it is a common operation used in lists, tuples, dicts and possibly others, so it makes sense for there to be a C API for it.
and resolves current inconsistencies w.r.t. containers and NaNs.
How does it resolve these (alleged) inconsistencies? The current status quo is that containers perform operations such as equality by testing for identity or equality, which they are permitted to do and is documented. Changing them to use your "identical or equal" API will (as far as I can see) change nothing about the semantics, behaviour or even implementation (since the C-level containers like list will surely still call PyObject_RichCompareBool rather than a Python-level wrapper). So whatever inconsistencies exist, they will still exist. If I have missed something, please tell me. -- Steven
On Tue, 2020-02-04 at 13:44 +1100, Steven D'Aprano wrote:
On Mon, Feb 03, 2020 at 05:26:38PM -0800, Sebastian Berg wrote:
<snip>
If you want to argue that "identical or equal" is such a fundamental and important operation in Python code that we ought to offer it ready- made in the operator module, I'm listening. But my gut feeling here is to say "not every one line expression needs to be in the stdlib".
Probably, yes. I am only semi seriously suggesting it. I am happy to get to the conclusion: NumPy is weird and NaNs are a corner case that you just have to understand at some point. Anyway, yes, I hinted at a dunder, I am not sure that is remotely reasonable. And yes, I thought that if this is an important enough of a "concept" it may make sense to bless it with a python side function.
PyObject_RichCompareBool is a different story. "Identical or equal" is not so simple to implement correctly in C code, and it is a common
Of course, it is just as simple C. If PyObject_RichCommpareBool would simply not include the identity check, in which case it is identical to `bool(a == b)` in python. (Which of course would be annoying to have to type out.)
operation used in lists, tuples, dicts and possibly others, so it makes sense for there to be a C API for it.
and resolves current inconsistencies w.r.t. containers and NaNs.
How does it resolve these (alleged) inconsistencies?
The alleged inconsistencies (which may be just me) are along these lines (plus those with NumPy): import math print({math.inf - math.inf for i in range(100}) print({math.nan for i in range(10)}) maybe I am alone to perceive that as an inconsistency. I _was_ saying that if you had a dunder, for this you could enforce that: * `a is b` implies `congruent(a, b)` * `a == b` implies `congruent(a, b)` * `hash(a) == hash(b)` implies `congruent(a, b)`. So the "inconsistencies" are that of course `hash(NaN)` and `NaN is NaN` fail to imply `NaN == NaN`, while congruent could be enforced to do it "right". Chris said it much better anyway, and is probably right to disregard the dunder part: 1. Name the current operation (congruent?) to reason about it? 2. Bless it with its own function? (helps maybe documenting it) 3. Consider if its worth resolving the above inconsistencies by making it an operator with a dunder. I am happy to stop at 0 :). I am sure similar discussions about the hash of NaN come up once a year. - Sebastian
The current status quo is that containers perform operations such as equality by testing for identity or equality, which they are permitted to do and is documented. Changing them to use your "identical or equal" API will (as far as I can see) change nothing about the semantics, behaviour or even implementation (since the C-level containers like list will surely still call PyObject_RichCompareBool rather than a Python-level wrapper).
So whatever inconsistencies exist, they will still exist.
If I have missed something, please tell me.
On Tue, Feb 4, 2020 at 10:12 AM Sebastian Berg
Now, probably this has been rejected a hundred times before, and there are some very good reason why it is a horrible thought...
But if `PyObject_RichCompareBool(..., Py_EQ)` is such a fundamental operation (and in a sense it seems to me that it is), is there a point in explicitly defining it?
That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation.
That operator would obviously be allowed to use the shortcut.
At that point container `==` and `in` (and equivalence) is defined based on element equivalence. NAs (missing value handling) may be an actual use-case where it is more than a theoretical thought. However, I do not seriously work with NAs myself.
The implication here is that there would be a corresponding dunder method, yes? If it's possible for a type to override it, that would need a dunder. I think that's not necessary; but if there were some useful name that could be given to this "identical or equal" comparison, then I think it'd be useful to (a) put that function into the operator table, and (b) use that name in the description of container operations. Can the word "equivalent" be used for this, perhaps? ChrisA
On Tue, Feb 04, 2020 at 12:33:44PM +1100, Chris Angelico wrote: [Sebastian Berg]
But if `PyObject_RichCompareBool(..., Py_EQ)` is such a fundamental operation (and in a sense it seems to me that it is), is there a point in explicitly defining it?
That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation.
The implication here is that there would be a corresponding dunder method, yes? If it's possible for a type to override it, that would need a dunder.
I think the whole point of this is that it *cannot* be overridden. That's the gist of Raymond's comments about being able to reason about behaviour. Individual values can override the equality test, but they cannot override the identity test, and that's a good thing. Can we summarise this issue like this? [quote] Containers or other compound objects are permitted to use identity testing to shortcut what would otherwise be an equality test (e.g. in list equality tests, and containment tests), even if that would change the behaviour of unusual values, such as floating point NANs which compare unequal to themselves, or objects where `__eq__` have side effects. Such containers are permitted to assume that their contents all obey the reflexivity of equality (each value is equal to itself) and so avoid calling `__eq__` or `__ne__`. This is an implementation-specific detail which may differ across different container types and interpreters. [end quote] I don't think we need to make any promises about which specific containers use this rule. If you need to know, you can test it for yourself: if (t:={'a': float('NAN')}) == t: print('dict equality obeys reflexivity') but otherwise, most people shouldn't need to care. [...]
Can the word "equivalent" be used for this, perhaps?
We don't need and shouldn't have a dunder for this, but the word "equivalent" would be wrong in any case. Two objects may be equivalent but not equal, for example, when it comes to iteration, the string "abc" is equivalent to the list ['a', 'b', 'c']. I don't think there is any accurate term shorter than "identical or equal". -- Steven
On Tue, Feb 4, 2020 at 1:08 PM Steven D'Aprano
On Tue, Feb 04, 2020 at 12:33:44PM +1100, Chris Angelico wrote:
[Sebastian Berg]
But if `PyObject_RichCompareBool(..., Py_EQ)` is such a fundamental operation (and in a sense it seems to me that it is), is there a point in explicitly defining it?
That would mean adding `operator.equivalent(a, b) -> bool` which would allow float to override the result and let `operator.equivalent_value(float("NaN"), float("NaN))` return True; luckily very few types would actually override the operation.
The implication here is that there would be a corresponding dunder method, yes? If it's possible for a type to override it, that would need a dunder.
I think the whole point of this is that it *cannot* be overridden.
Yes, I agree.
Can we summarise this issue like this?
[quote] Containers or other compound objects are permitted to use identity testing to shortcut what would otherwise be an equality test (e.g. in list equality tests, and containment tests), even if that would change the behaviour of unusual values, such as floating point NANs which compare unequal to themselves, or objects where `__eq__` have side effects.
Such containers are permitted to assume that their contents all obey the reflexivity of equality (each value is equal to itself) and so avoid calling `__eq__` or `__ne__`.
This is an implementation-specific detail which may differ across different container types and interpreters. [end quote]
I'd actually rather see it codified as a specific form of comparison and made a guarantee, upon which other guarantees and invariants can be based. It's not an optimization (although it can have the effect of improving performance), it's a codification of the expectations of containers. As such, this comparison would be defined by language rules as the way that built-in containers behave, and would also be the recommended and normal obvious way to build other container types.
Can the word "equivalent" be used for this, perhaps?
We don't need and shouldn't have a dunder for this, but the word "equivalent" would be wrong in any case. Two objects may be equivalent but not equal, for example, when it comes to iteration, the string "abc" is equivalent to the list ['a', 'b', 'c'].
Hmm, true, although that's equivalent only in one specific situation. In mathematics, "congruent" means that two things are functionally equivalent (eg triangles with the same length sides; in programming terms we'd probably say that two such triangles would be "equal" but not identical), even if there's a specific context for such equivalence, such as stating that 12,345 is congruent to 11 modulo 7, because the remainders 12345%7 and 11%7 are both 4. So maybe "congruent" could be used for this concept? ChrisA
On 2/3/2020 6:21 PM, Chris Angelico wrote:
Hmm, true, although that's equivalent only in one specific situation. In mathematics, "congruent" means that two things are functionally equivalent (eg triangles with the same length sides; in programming terms we'd probably say that two such triangles would be "equal" but not identical), even if there's a specific context for such equivalence, such as stating that 12,345 is congruent to 11 modulo 7, because the remainders 12345%7 and 11%7 are both 4. So maybe "congruent" could be used for this concept?
Congruent is different objects with the same characteristics, whereas identical is far stronger: same objects. But the reason <= and >= were invented was to avoid saying a < b or a == b and a > b or a == b It is just a shorthand. So just invent is== as shorthand for a is b or a == b.
participants (9)
-
Chris Angelico
-
Glenn Linderman
-
Guido van Rossum
-
Larry Hastings
-
Raymond Hettinger
-
Sebastian Berg
-
Serhiy Storchaka
-
Steven D'Aprano
-
Tim Peters