Guido rethinking removal of cmp from sort method
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Mar 30 22:13:53 EDT 2011
On Wed, 30 Mar 2011 11:06:20 +0200, Antoon Pardon wrote:
> As far as I can see, key will only produce significant speedups, if
> comparing items can then be completly done internally in the python
> engine without referencing user python code.
Incorrect. You don't even need megabytes of data to see significant
differences. How about a mere 1000 short strings?
[steve at wow-wow ~]$ python2.6
Python 2.6.6 (r266:84292, Dec 21 2010, 18:12:50)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from random import shuffle
>>> data = ['a'*n for n in range(1000)]
>>> shuffle(data)
>>> from timeit import Timer
>>>
>>> t_key = Timer('sorted(data, key=lambda a: len(a))',
... 'from __main__ import data')
>>> t_cmp = Timer('sorted(data, cmp=lambda a,b: cmp(len(a), len(b)))',
... 'from __main__ import data')
>>>
>>> min(t_key.repeat(number=1000, repeat=5))
0.89357517051696777
>>> min(t_cmp.repeat(number=1000, repeat=5))
7.6032949066162109
That's almost ten times slower.
Of course, the right way to do that specific sort is:
>>> t_len = Timer('sorted(data, key=len)', 'from __main__ import data')
>>> min(t_len.repeat(number=1000, repeat=5))
0.64559602737426758
which is even easier and faster. But even comparing a pure Python key
function to the cmp function, it's obvious that cmp is nearly always
slower.
Frankly, trying to argue that cmp is faster, or nearly as fast, is a
losing proposition. In my opinion, the only strategy that has even a
faint glimmer of hope is to find a convincing use-case where speed does
not matter.
Or, an alternative approach would be for one of the cmp-supporters to
take the code for Python's sort routine, and implement your own sort-with-
cmp (in C, of course, a pure Python solution will likely be unusable) and
offer it as a download. For anyone who knows how to do C extensions, this
shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
alone function that can be imported.
If you get lots of community interest in this, that is a good sign that
the solution is useful and practical, and then you can push to have it
included in the standard library or even as a built-in.
And if not, well, at least you will be able to continue using cmp in your
own code.
--
Steven
More information about the Python-list
mailing list