comparison of operator.itemgetter objects

Currently, __eq__() method is not defined in class operator.itemgetter, hence non-identical itemgetter objects compare as non-equal. I wanted to propose defining __eq__() method that would return the result of comparison for equality of the list of arguments submitted at initialization. This would make operator.itemgetter('name') compare as equal to operator.itemgetter('name'). The motivation for this is that sorted data structure (such as blist.sortedset) might want to verify if two arguments (say, lhs and rhs) of a binary operation (such as union) have the same sort key (a callable object passed to the constructor of the sorted data structure). Such a verification is useful because the desirable behavior of such binary operations is to use the common sort key if the lhs and rhs have the same sort key; and to raise an exception (or at least use a default value of the sort key) otherwise. I think that comparing sort keys for equality works well in many useful cases: (a) Named function. These compare as equal only if they are identical. If lhs and rhs were initialized with distinct named functions, I would argue that the programmer did not intend them to be compatible for the purpose of binary operations, even if they happen to be identical in behavior (e.g., if both functions return back the argument passed to them). In a well-designed program, there is no need to duplicate the named function definition if the two are expected to always have the same behavior. Therefore, the two distinct functions are intended to be different in behavior at least in some situations, and therefore the sorted data structure objects that use them as keys should be considered incompatible. (b) User-defined callable class. The author of such class should define __eq__() in a way that would compare as equal callable objects that behave identically, assuming it's not prohibitively expensive. Unfortunately, in two cases comparing keys for equality does not work well. (c) itemgetter. Suppose a programmer passed `itemgetter('name')` as the sort key argument to the sorted data structure's constructor. The resulting data structures would seem incompatible for the purposes of binary operations. This is likely to be confusing and undesirable. (d) lambda functions. Similarly, suppose a programmer passed `lambda x : -x` as the sort key argument to the sorted data structure's constructor. Since two lambda functions are not identical, they would compare as unequal. It seems to be very easy to address the undesirable behavior described in (c): add method __eq__() to operator.itemgetter, which would compare the list of arguments received at initialization. This would only break code that relies on an undocumented fact that distinct itemgetter instances compare as non-equal. The alternative is for each sorted data structure to handle this comparison on its own. This is repetitive and error-prone. Furthermore, it is expensive for an outsider to find out what arguments were given to an itemgetter at initialization. It is far harder to address the undesirable behavior described in (d). If it can be addressed at all, it would have to done in the sorted data structure implementation, since I don't think anyone would want lambda function comparison behavior to change. So for the purposes of this discussion, I ignore case (d). Is this a reasonable idea? Is it useful enough to be considered? Are there any downsides I didn't think of? Are there any other callables created by Python's builtin or standard library functions where __eq__ might be useful to define? Thanks, Max

Max Moroz wrote:
In general, I think that having equality tests fall back on identity test is so rarely what you actually want that sometimes I wonder why we bother. In this case I was going to say just write your own subclass, but: py> from operator import itemgetter py> class MyItemgetter(itemgetter): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type 'operator.itemgetter' is not an acceptable base type -- Steven

On 2 April 2012 10:38, Steven D'Aprano <steve@pearwood.info> wrote:
TypeError: type 'operator.itemgetter' is not an acceptable base type
Quite apart from the question of whether you might want to subclass operator.itemgetter, that's a really rubbish error message. Why is it not acceptable? Searching the source, it appears that types can say they can't be subclassed by setting Py_TPFLAGS_BASETYPE, so maybe a better error would be "the designer of type '%s' has disallowed subclassing". Still doesn't say why they did, but at least it gives a hint as to what's going on... Paul.

On Mon, Apr 2, 2012 at 5:38 AM, Steven D'Aprano <steve@pearwood.info> wrote:
In general, I think that having equality tests fall back on identity test is so rarely what you actually want that sometimes I wonder why we bother.
Because identity ==> equality. (There are exceptions, like NaN, but that behavior is buggy.) And for objects without a comparison function, the most commonly made comparison (e.g., as a dict key) is one where identity is desired. -jJ

On Mon, Apr 2, 2012 at 4:30 AM, Max Moroz <maxmoroz@gmail.com> wrote:
I think that comparing sort keys for equality works well in many useful cases:
It may be that they were created as inner functions, and the reason to duplicate was either to avoid creating the function at all unless it was needed, or to keep the smaller function's logic near where it was needed. In a sense, you are already recognizing this by asking that different but equivalent functions produced by the itemgetter factor compare equal.
operator.attrgetter seems similar.
Agreed. I think this may just be a case of someone assuming YAGNI, but if you do need it, and submit a patch, it should be OK.
Why not? If you really care about identity for a lambda function, then you should be using "is", and if you don't, then equivalent behavior should be enough. I would support a change to function.__eq__ (which would fall through to lambda) such that they were equal if they had the same bytecode, signature, and execution context (defaults, globals, etc). I would also support making functions and methods orderable, for more easily replicated reprs. I'm not volunteering to write the patch, at least today.
Is this a reasonable idea? Is it useful enough to be considered? Are there any downsides I didn't think of?
Caring that two functions are identical is probably even less common than sticking a function in a dict, and the "nope, these are not equal" case would get a bit slower. -jJ

On 4/5/2012 12:33 PM, Jim Jewett wrote:
Why not? If you really care about identity for a lambda function,
A 'lambda function' is simply a function whose .__name__ attribute is "<lambda>". There is no difference otherwise. Hence cases '(a) function' and '(d) lambda function' (in snipped portion) are the same class and
I would support a change to function.__eq__ (which would fall through to lambda)
'falling through' cannot happen as there is nothing other to fall through to. -- Terry Jan Reedy

Max Moroz wrote:
In general, I think that having equality tests fall back on identity test is so rarely what you actually want that sometimes I wonder why we bother. In this case I was going to say just write your own subclass, but: py> from operator import itemgetter py> class MyItemgetter(itemgetter): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: type 'operator.itemgetter' is not an acceptable base type -- Steven

On 2 April 2012 10:38, Steven D'Aprano <steve@pearwood.info> wrote:
TypeError: type 'operator.itemgetter' is not an acceptable base type
Quite apart from the question of whether you might want to subclass operator.itemgetter, that's a really rubbish error message. Why is it not acceptable? Searching the source, it appears that types can say they can't be subclassed by setting Py_TPFLAGS_BASETYPE, so maybe a better error would be "the designer of type '%s' has disallowed subclassing". Still doesn't say why they did, but at least it gives a hint as to what's going on... Paul.

On Mon, Apr 2, 2012 at 5:38 AM, Steven D'Aprano <steve@pearwood.info> wrote:
In general, I think that having equality tests fall back on identity test is so rarely what you actually want that sometimes I wonder why we bother.
Because identity ==> equality. (There are exceptions, like NaN, but that behavior is buggy.) And for objects without a comparison function, the most commonly made comparison (e.g., as a dict key) is one where identity is desired. -jJ

On Mon, Apr 2, 2012 at 4:30 AM, Max Moroz <maxmoroz@gmail.com> wrote:
I think that comparing sort keys for equality works well in many useful cases:
It may be that they were created as inner functions, and the reason to duplicate was either to avoid creating the function at all unless it was needed, or to keep the smaller function's logic near where it was needed. In a sense, you are already recognizing this by asking that different but equivalent functions produced by the itemgetter factor compare equal.
operator.attrgetter seems similar.
Agreed. I think this may just be a case of someone assuming YAGNI, but if you do need it, and submit a patch, it should be OK.
Why not? If you really care about identity for a lambda function, then you should be using "is", and if you don't, then equivalent behavior should be enough. I would support a change to function.__eq__ (which would fall through to lambda) such that they were equal if they had the same bytecode, signature, and execution context (defaults, globals, etc). I would also support making functions and methods orderable, for more easily replicated reprs. I'm not volunteering to write the patch, at least today.
Is this a reasonable idea? Is it useful enough to be considered? Are there any downsides I didn't think of?
Caring that two functions are identical is probably even less common than sticking a function in a dict, and the "nope, these are not equal" case would get a bit slower. -jJ

On 4/5/2012 12:33 PM, Jim Jewett wrote:
Why not? If you really care about identity for a lambda function,
A 'lambda function' is simply a function whose .__name__ attribute is "<lambda>". There is no difference otherwise. Hence cases '(a) function' and '(d) lambda function' (in snipped portion) are the same class and
I would support a change to function.__eq__ (which would fall through to lambda)
'falling through' cannot happen as there is nothing other to fall through to. -- Terry Jan Reedy
participants (5)
-
Jim Jewett
-
Max Moroz
-
Paul Moore
-
Steven D'Aprano
-
Terry Reedy