Sorting lists

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue Nov 18 01:14:58 CET 2008


On Mon, 17 Nov 2008 14:36:03 -0500, Terry Reedy wrote:

> bearophileHUGS at lycos.com wrote:
>> Chris Rebert:
>>> You use the `key` argument to .sort(): L2.sort(key=lambda item:
>>> item[1])
>> 
>> I like the lambda because it's a very readable solution that doesn't
>> require the std lib and it doesn't force the programmer (and the person
>> that reads the code) to learn yet another thing/function.
>> 
>> But I can suggest to also look for operator.itemgetter.
> 
> Since itemgetter is builtin, it will probably run faster, though the
> O(nlogn) time for sorting will override the O(n) time for key calls.

Well, eventually, for large enough lists, sure. But if the constants are 
sufficiently different, the key calls may be the bottleneck. O(foo) 
analysis is valuable, but it isn't the full story. A fast enough O(n**2) 
algorithm might be preferable to a slow O(log n) algorithm for any data 
you're interested in.

To give an extreme example, suppose your itemgetter function (not the 
Python built-in!) had to query some remote database over the internet. It 
might be O(n) but the multiplicative constant would be so large that the 
time taken to get items far dominates the time to sort the list, unless 
the list is *seriously* huge:

(Say) sorting takes O(n*log n) with multiplicative constant of 1e-5.
Item getting takes O(n) with multiplicative constant of 1.

Then the time taken to sort doesn't become larger than the time taken to 
get the items until approximately:

1e-5*n*log(n) > n
1e-5*log(n) > 1
log(n) > 1e5
n > 2**100000

which is pretty big... for any reasonable-sized list, the O(n) part 
dominates despite the asymptotic behaviour being O(n*log n).



-- 
Steven



More information about the Python-list mailing list