Balanced trees

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Mar 14 06:02:21 CET 2014


On Thu, 13 Mar 2014 20:12:41 -0700, Dan Stromberg wrote:

> Sorting the same (slightly tweaked) data inside of a tight loop is
> rarely a good idea - despite the fact that the "sort" itself tends to be
> O(n) with Python's rather awesome builtin sorting algorithm.  This is
> because sorting inside a loop tends to yield O(n^2) algorithms, or even
> O((n^2)*logn).

I agree with you except for the word "rarely". It depends on how much 
data you have, and in my experience most people think that N = 100 is a 
lot :-)

Remember that Big Oh doesn't say anything about the constant of 
proportionality. If you have one function that behaves like O(N) but with 
a constant factor of 1000, and another function that behaves like 
O(N**2) with a constant factor of 0.001, the second, theoretically 
slower, function will actually be faster for N < one million.

So *in practice*, a solution that involves a theoretically worse Big Oh 
function but a significantly lower constant factor may turn out to be 
better for all the values you care about. Given the difference in speed 
between Python's built-ins, and the code you can write yourself in pure 
Python, it is normally better to push as much work as you can onto the 
built-ins.

And of course, we should not make assumptions as to what is faster and 
what is slower. After 15+ years, I'm still surprised about what ends up 
being faster in Python.


> But if you're sorting once at the end of whatever other processing,
> sorting is great.  It's (still) O(nlogn), but it's got a terrific
> constant.

Exactly!



-- 
Steven



More information about the Python-list mailing list