[Python-Dev] Re: Re: lists v. tuples

Andrew Koenig ark@research.att.com
16 Mar 2003 13:59:30 -0500


Tim> [Guido]
>> Yes.  And I'm still hoping to remove __cmp__; there should be only one
>> way to overload comparisons.

Tim> As long as we're going to break everyone's code, list.sort(f)
Tim> should also be redefined then so that f returns a Boolean, f(a,
Tim> b) meaning a is less than b.

I don't think it's necessary to break code in order to accommodate
that change, as long as you're willing to tolerate one extra
comparison per call to sort, plus a small amount of additional
overhead.

As I understand it, the problem is to distinguish between a function
that returns negative, zero, or positive, depending on the result of
the comparison, and a function that returns true or false.  So if we
had a way to determine efficiently which kind of function the user
supplied, we could maintain compatibility.

Imagine, then, that we have a function f, and we want to figure out
which kind of function it is.  Assume, furthermore, that the only kind
of commparisons we want to perform is to determine whether a < b for
various values of a and b.

Note first that whenever f(a, b) returns 0, we don't care which kind
of function f is, because a < b will be false in either case.  So we
allow our sort algorithm to run until the first time a call to f(a, b)
returns a nonzero value.

Now we can determine what kind of function f is by calling f(b, a).
If f(b, a) is zero, then f is a boolean predicate.  If f(b, a) is
nonzero, then f returns negative/zero/positive -- and, incidentally,
f(b, a) had better have the opposite sign from f(a, b).

I understand that there is some overhead involved in storing the
information about which kind of comparison it is, and testing it on
each comparison.  I suspect, however, that that overhead can be made
small compared to the overhead involved in calling the comparison
function itself.


-- 
Andrew Koenig, ark@research.att.com, http://www.research.att.com/info/ark