[Python-Dev] cmp(x,x)

Tue May 25 13:07:31 EDT 2004

Raymond writes:
> The code in question is in Py_RichCompareBool() which *always* just
> returns True or False.  That routine is called by list.__contains__()
> and many other functions that expect a yes or no answer.
>
> The regular rich comparison function, Py_RichCompare() is the same as it
> always was and can still return arrays of bools, complex numbers, or
> anything at all.

All right, I took a look at exactly where PyObject_RichCompareBool is
called. Here is a COMPLETE list of all uses in 2.3.3 (that's what I
had lying around):

  * In listobject.c, to test containment in a list (ie: "x in [x]").

  * In tupleobject.c to perform containment tests on tuples
    (ie: "x in (a,b,c)").

  * In arraymodule.c, in array_contains to test containment in an array.

  * In listobject.c, in listindex(), listremove(), and listcount() to find
    the first occurance, delete the first occurance, or count the number of
    occurances of an object in the list.

  * In arraymodule.c, in array_index(), array_remove(), and array_count() to
    find the first occurance, delete the first occurance, or count the
    number of occurances of an object in the array.

  * In abstract.c, in _PySequence_IterSearch() to test containment of, find
    the first occurance of, or count the occurances of some object.

  * In listobject.c, in list_richcompare(), to skip past identical list
    elements when comparing lists.

  * In tupleobject.c, in tuplerichcompare() to skip past identical tuple
    elements when comparing tuples.

  * In arraymodule.c, in array_richcompare, to skip past identical array
    elements when comparing arrays.

  * In listobject.c, used in sorting if the user does not provide a
    user-defined comparison function.

  * In bltinmodule.c, in min_max(), to compare two objects within the min
    and max functions.

  * In iterobject.c, to test whether an object is the sentinal object.

  * In dictobject.c, in lookdict(), to test whether an key in the dict
    matches the key being looked up.

  * In dictobject.c, in characterize() ... I'm not quite sure what this
    is doing.

  * In dictobject.c when comparing two dicts for equality.

  * In typeobject.c, in method_is_overloaded, to test whether the methods
    defined in a class and some subclass are actually the same object.

  * In ceval.c, for matching keyword arguments to code object varnames
    in PyEval_EvalCodeEx.

  * In ceval.c in _PyEval_SliceIndex() to compare a bound to the number
    0L.

  * And finally, it is part of the published API so it could appear
    anywhere in extension modules... but I would guess that
    people use PyObject_RichCompare normally and only call
    PyObject_RichCompareBool if they really want a boolean.

What I learned by writing this list is that Raymond is right...
being able to assume that "x is x" implies "x == x" is very useful
for implementors. And if that assumption is made in
PyObject_RichCompareBool() and NOT in PyObject_RichCompare(), then
it hardly ever needs to conflict with the desire for user-defined
comparison functions to return peculiar things (like Numeric arrays,
values that are artifically not equal (like NANs or a poorly-designed
user-implemented Max object).

Okay, I know this is too much analysis, but I've started now, so
I'm going to go through with it and send off this email. Here's my
thought: Raymond has convinced me that we want to make this assumption
in the interpreter. I want the rules for when a user-defined comparison
function (__cmp__, __eq__, or __ne__) is invoked and when it isn't.
Maybe we can achieve both.

Here's that list above, grouped by function:

  1. Test Containment in a sequence, also index(), count(), and remove() on
     sequences.

  2. Comparing sequences or dicts to each other.

  3. Sorting lists when no user-defined comparison function is given.

  4. in min() and max()

  5. checking for sentinal in iteration

  6. looking up dict keys

  7. characterize() in dictobject (what's this for?)

  8. Checking if a method is the same to tell if it's overloaded.

  9. Matching keyword arguments to code object varnames in EvalCode

 10. Compare a slice bound to 0L.

 11. Extension modules that call PyObject_RichCompareBool instead of
     PyObject_RichCompare.

This is the list of all places where RichCompareBool is called instead
of RichCompare, and thus of all places where a user-defined comparison
function might (surprisingly) not be called. Some are not relevent
(eg: #9 compares only strings, a built-in type). For some, I can
concoct artifical examples where someone would care (eg: #5: creating
a sentinal for iter() that tries to be so VERY clever that it allows
the sentinal object itself to occur in the sequence sometimes
without stopping the iteration; or #3: trying to understand the
behavior of timsort by creating objects that log comparisons rather
than by reading the code). But these feel artificial to me. The big
ones seem to be #1 and #2.

I'd even say that it makes "intuitive sense" to me somehow that
containment, index(), count() and remove() all act as if identity
implied equality (although I can think of only *slightly* absurd
examples where a user might try to alter this behavior. But when
comparing sequences or dicts to each other (something people do
a LOT), I would intuitively expect that any contained objects
with customized comparison methods would have those methods
invoked.

Okay... I've talked myself into a corner now. Raymond has convinced
me that my original idea was misguided, and I've looked closely at
the problem, but I don't see an "obvious" solution. I'm tending
to think it's best to put the test for identity in
PyObject_RichCompareBool, but then how do we explain (in simple terms)
when user-defined comparison methods are invoked and when they're
not necessarily?

Well, even without an answer, I'd better send this email off and
get back to my own work. I'm not sure if I've gotten anywhere with
this or not.

-- Michael Chermside