[Python-3000] Please re-add __cmp__ to python 3000

David A. Wheeler dwheeler at dwheeler.com
Wed Oct 17 18:40:08 CEST 2007


I said:
> I agree with Collin Winter: losing __cmp__ is a loss  (see http://oakwinter.com/code/).

Steven Bethard said:
>Why can't this just be supplied with a mixin?  Here's a recipe
>providing the appropriate mixins if you want to define a __key__
>function:
>    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

That _works_ from a functional perspective, and if Python3 fails to include direct support for __cmp__, then I think providing a built-in mixin is necessary.

But mixins for comparison are a BIG LOSER for sort performance if your fundamental operator is a cmp-like function. Sorting is completely dominated by comparison time, and the mixin is a big lose for performance.  Basically, sorts always involve an inner loop that does comparisons, so any time comparison is slow, you're potentially dooming the whole program to a slow inner loop.  A mixin-calling-cmp model doubles the function call work; it has to find the mixin, call it, which eventually has to find and call the final cmp operator.

I did a test (see below), and the mixin using a simulated cmp took 50% MORE time to sort a list using Python 2.5 (see code below) than when __cmp__ is used directly (as you CAN do in Python 2.5).  A few tests with varying list size lead me to believe that this isn't linear - as the lists get longer the % performance hit increases.  In other words, it's a LOT slower, and as the size increases it gets far worse.   That kind of nasty performance hit will probably lead people to write lots of code that duplicates comparison functionality in each __lt__, __gt__, etc.  When the comparisons THEMSELVES are nontrivial, that will result in lots of duplicate, potentially-buggy code.  All of which is avoidable if __cmp__ is directly supported, as it ALREADY is in Python 1 and 2.

In addition, even IF the performance wasn't a big deal (and I think it is), I believe __cmp__ is the better basic operator in most cases.  As a style issue, I strongly prefer __cmp__ unless I have a specific need for comparisons which are atypical, e.g., where sometimes both __lt__ and __ge__ will return false given the same data (IEEE floats do this if you need exactly-IEEE-specified behavior of NaNs, etc.).  By preferring __cmp__ I eliminate lots of duplicate code, and once it's right, it's always right for ALL comparisons.  Sometimes __lt__ and friends are absolutely needed, e.g., when __lt__(x,y)==__gt__(x,y) for some values of x,y, but most of the time I find that they're an indicator of bad code and that __cmp__ should have been used instead.   Direct support of __cmp__ is a GOOD thing, not a wart or obsolete feature.

Adding a standard comparison mixin in a library is probably a good idea as well, but restoring __cmp__ is in my mind more important.  I can write my own mixin, but working around a failure to call __cmp__ gives a big performance hit.

--- David A. Wheeler


========================================
Here's my quick test code, in two files cmptest and cmptest2.
The whitespace may be munged by my mailer or the list, sorry if it is.

==== cmptest2 ====

#!/usr/bin/env python2

# cmp-test2.py

import timeit


time1 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberWithCmp(x) for x in randomlist]
""")
time2 = timeit.Timer("x = sorted(list)", """
import cmptest
import random
randomlist = [random.randrange(1,100000) for x in range(100000)]
list = [cmptest.NumberMixinCmp(x) for x in randomlist]
""")

finaltime1 = time1.timeit(3)
finaltime2 = time2.timeit(3)

print finaltime1
print finaltime2



====== cmptest ======

#!/usr/bin/env python2

# cmp-test.py

import random
import timeit

class NumberWithCmp(object):
    "Store 'x' for comparison"
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def __cmp__(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)

# Mixin, similar to http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/510403

class ComparisonMixin(object):
    "Implement <, >, etc. by invoking a 'cmp' function."
    def __lt__(self, other):
        return self.cmp(other) < 0
    def __le__(self, other):
        return self.cmp(other) <= 0
    def __gt__(self, other):
        return self.cmp(other) > 0
    def __ge__(self, other):
        return self.cmp(other) >= 0

class NumberMixinCmp(ComparisonMixin):
    def __init__(self, data):
       self.x = data
    def __str__(self):
        return str(self.x)
    def cmp(self, other):
        if self.x == other.x: return 0
        return (-1 if self.x < other.x else 1)





More information about the Python-3000 mailing list