Binary arithmetic does not always call subclasses first
Note: If the right operand's type is a subclass of the left operand’s type and that subclass provides the reflected method for the operation,
I recently filed this as a bug, and was asked to repost to python-dev or python-ideas for greater visibility: http://bugs.python.org/issue30140 Without further ado, here is my original report: --------------------------- We are writing a system for overloading NumPy operations (see PR [1] and design doc [2]) that is designed to copy and extend Python's system for overloading binary operators. The reference documentation on binary arithmetic [3] states: this method will be called before the left operand’s non-reflected method. This behavior allows subclasses to override their ancestors’ operations. However, this isn't actually done if the right operand merely inherits from the left operand's type. In practice, CPython requires that the right operand defines a different method before it defers to it. Note that the behavior is different for comparisons, which defer to subclasses regardless of whether they implement a new method [4]. I think this behavior is a mistake and should be corrected. It is just as useful to write generic binary arithmetic methods that are well defined on subclasses as generic comparison operations. In fact, this is exactly the design pattern we propose for objects implementing special operators like NumPy arrays (see NDArrayOperatorsMixin in [1] and [2]). Here is a simple example, of a well-behaved type that implements addition by wrapping its value and that returns NotImplemented when the other operand has the wrong type: class A: def __init__(self, value): self.value = value def __add__(self, other): if not isinstance(other, A): return NotImplemented return type(self)(self.value + other.value) __radd__ = __add__ def __repr__(self): return f'{type(self).__name__}({self.value!r})' class B(A): pass class C(A): def __add__(self, other): if not isinstance(other, A): return NotImplemented return type(self)(self.value + other.value) __radd__ = __add__ A does not defer to subclass B:
A(1) + B(1) A(2)
But it does defer to subclass C, which defines new methods (literally copied/pasted) for __add__/__radd__:
A(1) + C(1) C(2)
With the current behavior, special operator implementations need to explicitly account for the possibility that they are being called from a subclass by returning NotImplemented. My guess is that this is rarely done, which means that most of these methods are broken when used with subclasses, or subclasses needlessly reimplement these methods. Can we fix this logic for Python 3.7? [1] https://github.com/numpy/numpy/pull/8247 [2] https://github.com/charris/numpy/blob/406bbc652424fff332f49b0d2f2e5aedd8191d... [3] https://docs.python.org/3/reference/datamodel.html#object.__ror__ [4] http://bugs.python.org/issue22052 ----------------------------- Mark Dickinson's response: This is probably worth bringing up on the python-dev or python-ideas mailing lists for greater visibility. I can't think of any plausible non-historical reason why it makes sense for comparisons to behave one way and arithmetic operators another. Altering this might be a PEP-level change, though. The "Coercion rules" section[1] of the Python 2.7 docs is a bit more explicit about the intent: """ Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class and overrides the base’s __rop__() method, the right operand’s __rop__() method is tried before the left operand’s __op__() method. """ so the check for an override was clearly intentional, rather than an implementation convenience or accident. (It's also clearly intentional in the source and comments.) The 3.x docs don't have the "and overrides" language; I haven't figured out why and when that language changed. [1] https://docs.python.org/release/2.7.6/reference/datamodel.html#coercion-rule...
On Sun, Apr 23, 2017 at 06:23:12PM -0700, Stephan Hoyer wrote:
I recently filed this as a bug, and was asked to repost to python-dev or python-ideas for greater visibility: http://bugs.python.org/issue30140
Without further ado, here is my original report: [...]
The reference documentation on binary arithmetic [3] states:
Note: If the right operand's type is a subclass of the left operand’s type and that subclass provides the reflected method for the operation, this method will be called before the left operand’s non-reflected method. This behavior allows subclasses to override their ancestors’ operations.
However, this isn't actually done if the right operand merely inherits from the left operand's type. In practice, CPython requires that the right operand defines a different method before it defers to it. Note that the behavior is different for comparisons, which defer to subclasses regardless of whether they implement a new method [4].
I think that's a bug. At the very least, comparisons and operators should behave the same. I think that the comparison behaviour is correct: the subclass reflected method should be called, even if the method isn't explicitly over-ridden.
I think this behavior is a mistake and should be corrected. It is just as useful to write generic binary arithmetic methods that are well defined on subclasses as generic comparison operations.
Agreed. [...]
Here is a simple example, of a well-behaved type that implements addition by wrapping its value and that returns NotImplemented when the other operand has the wrong type:
I often write classes like your example, and it is humbling to realise that they are buggy! I must admit I misread the docs and didn't notice the "and overrides" language, and clearly I didn't test my classes sufficiently or else I would have discovered this myself :-(
A does not defer to subclass B:
A(1) + B(1) A(2)
But it does defer to subclass C, which defines new methods (literally copied/pasted) for __add__/__radd__:
A(1) + C(1) C(2)
To me, that's the deciding point. We have two ways to go, and one of them encourages the copy-and-paste anti-pattern, the other doesn't. I think that's a Bad Thing and we should treat the comparison behaviour as correct, and the other operators are buggy. An explicit over-ride shouldn't be necessary.
Mark Dickinson's response:
The "Coercion rules" section[1] of the Python 2.7 docs is a bit more explicit about the intent:
""" Exception to the previous item: if the left operand is an instance of a built-in type or a new-style class, and the right operand is an instance of a proper subclass of that type or class and overrides the base’s __rop__() method, the right operand’s __rop__() method is tried before the left operand’s __op__() method. """
so the check for an override was clearly intentional, rather than an implementation convenience or accident. (It's also clearly intentional in the source and comments.)
The next paragraph tells us: This is done so that a subclass can completely override binary operators. Otherwise, the left operand’s __op__() method would always accept the right operand: when an instance of a given class is expected, an instance of a subclass of that class is always acceptable. but as far as I can tell, the comparison behaviour equally accomplishes that, without an explicit over-ride.
The 3.x docs don't have the "and overrides" language; I haven't figured out why and when that language changed.
[1] https://docs.python.org/release/2.7.6/reference/datamodel.html#coercion-rule...
Stephan Hoyer wrote:
In practice, CPython requires that the right operand defines a different method before it defers to it.
I'm not sure exactly what the rationale for this behaviour is, but it's probably something along the lines that the left method should already know how to deal with that combination of types, and right methods are only supposed to be called as a fallback if the left method can't handle the operands, so calling it in that situation would be wrong. Following that logic, the wrapper's __add__ method in your example needs to allow for the subclassing case, e.g. def __add__(self, other): t1 = type(self) t2 = type(other) t = t2 if issubclass(t2, t1) else t1 return t(self.value + other.value)
the behavior is different for comparisons, which defer to subclasses regardless of whether they implement a new method
Comparisons are a bit different, because they don't have separate left and right methods, although it's hard to see exactly how that affects the logic.
I think this behavior is a mistake and should be corrected.
Possibly, but it's not clear how much breakage might result from changing it now. Although...
The 3.x docs don't have the "and overrides" language;
so arguably we would be changing the implementation to match the docs. -- Greg
Stephan Hoyer wrote:
In practice, CPython requires that the right operand defines a different method before it defers to it.
I'm not sure exactly what the rationale for this behaviour is, but it's probably something along the lines that the left method should already know how to deal with that combination of types, and right methods are only supposed to be called as a fallback if the left method can't handle the operands, so calling it in that situation would be wrong.
Yes, this could makes sense. But in that case, why check for explicitly overridden methods on subclasses at all? I can rationalize either not
On Sun, Apr 23, 2017 at 10:57 PM, Greg Ewing
The 3.x docs don't have the "and overrides" language;
so arguably we would be changing the implementation to match the docs.
Based on the change in the documentation between 2.x and 3.x, I wonder if this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here. Nonetheless, at this point the behavior has been around for quite some time. Almost assuredly, there is *someone* relying on this feature/bug, though probably unintentionally. So I would also love to hear from anyone who knows of code that this change would actually break (as opposed to fix and/or allow for removing redundant methods). More broadly: is this change significant enough that it needs a PEP?
On Mon, Apr 24, 2017 at 05:57:17PM +1200, Greg Ewing wrote:
Stephan Hoyer wrote:
In practice, CPython requires that the right operand defines a different method before it defers to it.
I'm not sure exactly what the rationale for this behaviour is, but it's probably something along the lines that the left method should already know how to deal with that combination of types, and right methods are only supposed to be called as a fallback if the left method can't handle the operands, so calling it in that situation would be wrong.
I've never seen that rationale before, and I don't think I would agree with it. And it goes against the rationale in the docs: [...] the right operand’s __rop__() method is tried before the left operand’s __op__() method. This is done so that a subclass can completely override binary operators. Otherwise, the left operand’s __op__() method would always accept the right operand: when an instance of a given class is expected, an instance of a subclass of that class is always acceptable. I think your rationale goes against the intention as documented. There's no expectation that __rop__ methods are only to be called when the __op__ method can't handle the operands. In general, which operand "wins" should depend on the classes, not on whether they happen to be on the left or right of the operator. (Except in the case where we cannot decide between the operands, in which case we break ties by preferring the __op__.) The reason is that subclasses are usually intended to be more specialised than their parent, and so they ought to be given priority in mixed operations. Given classes X, Y(X), with instances x and y, we should expect that the more specialised class (namely Y) gets called first whether we write: x ⊕ y or y ⊕ x for any operator ⊕. As documented in the 3 docs, we get that for free: the interpreter correctly calls the __op__ or __rop__ method, as needed, and the class author doesn't have to think about it. That's how it's documented, but not how it's implemented. The alternative is that every class has to include boilerplate testing for subclasses, as you say:
Following that logic, the wrapper's __add__ method in your example needs to allow for the subclassing case, e.g.
def __add__(self, other): t1 = type(self) t2 = type(other) t = t2 if issubclass(t2, t1) else t1 return t(self.value + other.value)
but that's bad. That makes each and every class (that might ever be subclassed) responsible for checking for subclasses, instead of putting the check in one place (whichever part of the interpreter handles calling __op__/__rop__ methods). Remember that a specialised subclass might not overload the __op__ and __rop__ methods themselves. It might overload a data attribute, or another method that __op__ / __rop__ call. class A: def __add__(self, other): self.log() ... __radd__ = __add__ class B(A): def log(self): ... A() + B() As the more specialised instance (a subclass of A), the right hand operand should get the priority.
the behavior is different for comparisons, which defer to subclasses regardless of whether they implement a new method
Comparisons are a bit different, because they don't have separate left and right methods, although it's hard to see exactly how that affects the logic.
It doesn't affect the logic, and comparisons implement exactly the documented (in 3) behaviour. The only difference is that the reversed methods aren't spelled __rop__: __eq__ and __ne__ are their own reflection; __lt__ and __gt__ __le__ and __ge__ For example: py> class A(object): ... def __lt__(self, other): ... print("lt", self) ... return True ... def __gt__(self, other): ... print("gt", self) ... return False ... py> class B(A): ... pass ... py> A() < B() gt <__main__.B object at 0xb7a9e8ec> False The more specialised class (B) has its method called, even though it isn't over-ridden. -- Steve
On 4/24/2017 12:14 PM, Stephan Hoyer wrote:
Based on the change in the documentation between 2.x and 3.x, I wonder if this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here.
We should ask the intention of the person who made the change, which is in the repository. If you email me the doc (the last part of the url) and location within, I will look it up. -- Terry Jan Reedy
GitHub shows that that note hasn't changed in 10 years:
https://github.com/python/cpython/blame/master/Doc/reference/datamodel.rst#L...
On Mon, Apr 24, 2017 at 3:15 PM, Terry Reedy
On 4/24/2017 12:14 PM, Stephan Hoyer wrote:
Based on the change in the documentation between 2.x and 3.x, I wonder if
this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here.
We should ask the intention of the person who made the change, which is in the repository. If you email me the doc (the last part of the url) and location within, I will look it up.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
+Georg Brandl, in case he remembers where "Move the 3k reST doc tree in
place." moved things from:
https://github.com/python/cpython/commit/116aa62bf54a39697e25f21d6cf6799f7fa...
On Mon, Apr 24, 2017 at 4:29 PM, Nick Timkovich
GitHub shows that that note hasn't changed in 10 years: https://github.com/python/cpython/blame/master/ Doc/reference/datamodel.rst#L2210
On Mon, Apr 24, 2017 at 3:15 PM, Terry Reedy
wrote: On 4/24/2017 12:14 PM, Stephan Hoyer wrote:
Based on the change in the documentation between 2.x and 3.x, I wonder if
this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here.
We should ask the intention of the person who made the change, which is in the repository. If you email me the doc (the last part of the url) and location within, I will look it up.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
If this is portant I should probably ponder it.
On Apr 24, 2017 4:47 PM, "Stephan Hoyer"
+Georg Brandl, in case he remembers where "Move the 3k reST doc tree in place." moved things from: https://github.com/python/cpython/commit/116aa62bf54a39697e25f21d6cf679 9f7faa1349
On Mon, Apr 24, 2017 at 4:29 PM, Nick Timkovich
wrote: GitHub shows that that note hasn't changed in 10 years: https://github.com/python/cpython/blame/master/Doc/ reference/datamodel.rst#L2210
On Mon, Apr 24, 2017 at 3:15 PM, Terry Reedy
wrote: On 4/24/2017 12:14 PM, Stephan Hoyer wrote:
Based on the change in the documentation between 2.x and 3.x, I wonder
if this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here.
We should ask the intention of the person who made the change, which is in the repository. If you email me the doc (the last part of the url) and location within, I will look it up.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Now that I am with a real keyboard and screen and have tried to understand
the OP, I can actually write up my thoughts on the matter.
There are two aspects to the behavior. Giving preference to the class of
the right operand if it is a subclass of the left operand's class is
reasonable and explained in the docs.
Only doing this if the right operand actually overrides __rop__ was perhaps
meant as an optimization, and I expect that when I introduced that rule I
hadn't thought of the kind of classes that use type(self) or self.__class__
to return an instance of the runtime class.
However there's also another thing to consider. Consider a variant of the
OP example, where B doesn't override __add__ or __radd__, but now add a
different __init__ signature, one that requires completely different
arguments. This is entirely legal and done often enough (though perhaps not
in numpy circles?) -- constructors are not subject to the Liskov rule. So
the call to type(self)(<value>) may crash or have an undesirable result.
But given that A does call type(self)(<value>), all its subclasses have to
either have a compatible __init__ or override both __add__ and __radd__.
In the end I agree with the OP that we should fix this. I don't see a
reason to require a PEP or require updating whatever PEP described this
behavior originally -- PEPs generally describe what should be done to a
specific version of Python, they don't prevent future alterations, and they
essentially represent the historical record, not current documentation.
I'm a little worried about breaking existing code, but only a little bit,
and this is clearly a gray area, so I think it's okay to change in 3.7
without deprecations. (But I've been overruled on such matters before, so
if you disagree, speak up now and show us your code!)
--Guido
On Mon, Apr 24, 2017 at 4:54 PM, Guido van Rossum
If this is portant I should probably ponder it.
On Apr 24, 2017 4:47 PM, "Stephan Hoyer"
wrote: +Georg Brandl, in case he remembers where "Move the 3k reST doc tree in place." moved things from: https://github.com/python/cpython/commit/116aa62bf54a39697e2 5f21d6cf6799f7faa1349
On Mon, Apr 24, 2017 at 4:29 PM, Nick Timkovich
wrote: GitHub shows that that note hasn't changed in 10 years: https://github.com/python/cpython/blame/master/Doc/re ference/datamodel.rst#L2210
On Mon, Apr 24, 2017 at 3:15 PM, Terry Reedy
wrote: On 4/24/2017 12:14 PM, Stephan Hoyer wrote:
Based on the change in the documentation between 2.x and 3.x, I wonder
if this is something that someone intended to clean up as part of Python 3000 but never got around to. I would love to hear from anyone familiar with the historical context here.
We should ask the intention of the person who made the change, which is in the repository. If you email me the doc (the last part of the url) and location within, I will look it up.
-- Terry Jan Reedy
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 26 April 2017 at 02:56, Guido van Rossum
In the end I agree with the OP that we should fix this. I don't see a reason to require a PEP or require updating whatever PEP described this behavior originally -- PEPs generally describe what should be done to a specific version of Python, they don't prevent future alterations, and they essentially represent the historical record, not current documentation.
I'm a little worried about breaking existing code, but only a little bit, and this is clearly a gray area, so I think it's okay to change in 3.7 without deprecations. (But I've been overruled on such matters before, so if you disagree, speak up now and show us your code!)
This is really obscure behaviour to be relying on, so a porting note + the 3.7 pre-release testing cycles seems like sufficient notice to me. It's potentially also worth checking how PyPy handles these cases - for the only other similar case I'm aware of (the quirks with the relative priority of the nb_* and sq_* slots at the C layer), enough projects relied on the CPython behaviour for them to decide to replicate it, so if they *haven't* replicated the quirk described in the OP, it's a solid data point suggesting there aren't a lot of major projects relying on it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (7)
-
Greg Ewing
-
Guido van Rossum
-
Nick Coghlan
-
Nick Timkovich
-
Stephan Hoyer
-
Steven D'Aprano
-
Terry Reedy