sort order for strings of digits

Steven D'Aprano steve+comp.lang.python at pearwood.info
Thu Nov 1 00:44:14 CET 2012


On Wed, 31 Oct 2012 19:05:17 -0400, Dennis Lee Bieber wrote:

>> The cmp builtin is also gone.  If you need it, the suggested
>> replacement for "cmp(a, b)" is "(b < a) - (a < b)".
>>
> 	OUCH... Just another reason for my to hang onto the 2.x series as
> long as possible 

On the contrary. If you are using cmp with sort, your sorts are slow, and 
you should upgrade to using a key function as soon as possible.

For small lists, you may not notice, but for large lists using a 
comparison function is a BAD IDEA.

Here's an example: sorting a list of numbers by absolute value.

py> L = [5, -6, 1, -2, 9, -8, 4, 3, -7, 2, -3]
py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]

But the amount of work done is radically different. Let's temporarily 
shadow the built-ins with patched versions:

py> _abs = abs
py> _abs, _cmp = abs, cmp
py> c1 = c2 = 0
py> def abs(x):
...     global c1
...     c1 += 1
...     return _abs(x)
...
py> def cmp(a, b):
...     global c2
...     c2 += 1
...     return _cmp(a, b)
...

Now we can see just how much work is done under the hood using a key 
function vs a comparison function:

py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1
11

So the key function is called once for each item in the list. But:


py> c1 = 0  # reset the count
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1, c2
(54, 27)

The comparison function is called 27 times for a list of nine items (a 
average of 2.5 calls to cmp per item), and abs is called twice for each 
call to cmp. (Well, duh.)

If the list is bigger, it gets worse:

py> c2 = 0
py> x = sorted(L*10, lambda a, b: cmp(abs(a), abs(b)))
py> c2
592

That's an average of 5.4 calls to cmp per item. And it gets even worse as 
the list gets bigger.

As your lists get bigger, the amount of work done calling the comparison 
function gets ever bigger still. Sorting large lists with a comparison 
function is SLOOOW.

py> del abs, cmp  # remove the monkey-patched versions
py> L = L*1000000
py> with Timer():
...     x = sorted(L, key=abs)
...
time taken: 9.165448 seconds
py> with Timer():
...     x = sorted(L, lambda a, b: cmp(abs(a), abs(b)))
...
time taken: 63.579679 seconds


The Timer() context manager used can be found here:

http://code.activestate.com/recipes/577896



-- 
Steven


More information about the Python-list mailing list