Ah, that's much clearer than all the English words written so far here. :-) Let me go over this function (binary_op1()) for subtraction, the example from your blog.
One piece of magic is that there are no separate `__sub__` and `__rsub__` implementations at this level -- the `tp_as_number` struct just has a slot named `nb_subtract` that takes two objects and either subtracts them or returns NotImplemented.
This means that (**at this level**) there really is no point in calling `__rsub__` if the lhs and rhs have the same type, because it would literally just call the same `nb_subtract` function with the same arguments a second time.
And if the types are different but the functions in `nb_subtract` are still the same, again we'd be calling the same function with the same arguments twice.
That's some macro!
Now, interestingly, this macro may call *both* `left.__sub__(right)` and `right.__rsub__(left)`. That is surprising, since there's also logic to call left's nb_subtract and right's nb_subtract in binary_op1(). What's up with that? Could we come up with an example where `a-b` makes more than two calls? For that to happen we'd have to trick binary_op1() into calling both. But I think that's impossible, because all Python classes have the same identical function in nb_subtract (the function is synthesized using SLOT1BIN -> SLOT1BINFULL), and in that case binary_op1() skips the second call (the two lines that Brett highlighted!). So we're good here.
But maybe here we have another explanation for why binary_op1() is careful to skip the second call. (The slot function duplicates this logic so it will only call `__sub__` in this case.)
Since rich comparison doesn't have this safeguard, can we trick *that* into making more than two calls? No, because the "reverse" logic (`self.__lt__(other)` -> `other.__gt__(self)` etc.) is only implemented once, in do_richcompare() in abstract.c. The slot function in typeobject.c (slot_tp_richcompare()) is totally tame.
So the difference goes back to the design at the C level -- the number slots don't have separate `__sub__` and `__rsub__` implementations (the C function in nb_subtract has no direct way of knowing if it was called on behalf of its first or second argument), and the complications derive from that. The rich comparison slot has a clear `op` flag that always tells it which operation was requested, and the implementation is much saner because of it.
So yes, in a sense the difference is because rich comparison is much newer than binary operators in Python -- binary operators are still constrained by the original design, which predates operator overloading in user code (i.e. `__sub__` and `rsub__`). But it was not a matter of forgetting anything -- it was a matter of better design.
(Brett, maybe this warrants an update to your blog post?)
--