sort order for strings of digits
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed Oct 31 19:44:14 EDT 2012
On Wed, 31 Oct 2012 19:05:17 -0400, Dennis Lee Bieber wrote:
>> The cmp builtin is also gone. If you need it, the suggested
>> replacement for "cmp(a, b)" is "(b < a) - (a < b)".
>>
> OUCH... Just another reason for my to hang onto the 2.x series as
> long as possible
On the contrary. If you are using cmp with sort, your sorts are slow, and
you should upgrade to using a key function as soon as possible.
For small lists, you may not notice, but for large lists using a
comparison function is a BAD IDEA.
Here's an example: sorting a list of numbers by absolute value.
py> L = [5, -6, 1, -2, 9, -8, 4, 3, -7, 2, -3]
py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
But the amount of work done is radically different. Let's temporarily
shadow the built-ins with patched versions:
py> _abs = abs
py> _abs, _cmp = abs, cmp
py> c1 = c2 = 0
py> def abs(x):
... global c1
... c1 += 1
... return _abs(x)
...
py> def cmp(a, b):
... global c2
... c2 += 1
... return _cmp(a, b)
...
Now we can see just how much work is done under the hood using a key
function vs a comparison function:
py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1
11
So the key function is called once for each item in the list. But:
py> c1 = 0 # reset the count
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1, c2
(54, 27)
The comparison function is called 27 times for a list of nine items (a
average of 2.5 calls to cmp per item), and abs is called twice for each
call to cmp. (Well, duh.)
If the list is bigger, it gets worse:
py> c2 = 0
py> x = sorted(L*10, lambda a, b: cmp(abs(a), abs(b)))
py> c2
592
That's an average of 5.4 calls to cmp per item. And it gets even worse as
the list gets bigger.
As your lists get bigger, the amount of work done calling the comparison
function gets ever bigger still. Sorting large lists with a comparison
function is SLOOOW.
py> del abs, cmp # remove the monkey-patched versions
py> L = L*1000000
py> with Timer():
... x = sorted(L, key=abs)
...
time taken: 9.165448 seconds
py> with Timer():
... x = sorted(L, lambda a, b: cmp(abs(a), abs(b)))
...
time taken: 63.579679 seconds
The Timer() context manager used can be found here:
http://code.activestate.com/recipes/577896
--
Steven
More information about the Python-list
mailing list