Big-O notation (was: Re: Python and Schools)
Chad Netzer
cnetzer at mail.arc.nasa.gov
Wed Apr 16 02:16:09 CEST 2003
On Tue, 2003-04-15 at 15:58, Roy Smith wrote:
> We've got something which scans a directory (an O(N) process). It's
> supposed to happen once every time the program is run. Somebody goofed
> in coding, and moved it into the loop that happens once every input
> file, not once per run. Now it's an O(N^2) process.
>
> Nobody noticed it in testing because we only tested it with maybe 100
> files. Then one of our customers ran it with 25,000 files (a perfectly
> reasonable thing to do). Whoops :-)
I have unit tests that actually try to detect these things using
regressions and t-tests. For example, I have an implementation of a
numerical algorithm that I know is supposed to be O(N). In my tests, I
time the result of the algorithm with varying input sizes (ie. different
N), and do a quadratic regression (like a linear regression, but with a
term for x**2).
My assumption is that the x**2 term should have a coefficient near zero
(ie. zero "curve" to the regression). I then use a t-test to verify
this assumption, using a few runs on random data. Set the confidence
level fairly low, and it seems to do a good job of detecting if I
accidentally make a O(N**2) algorithm, rather than the expected O(N)
(with very few false positives).
Fairly quick and easy to implement (given existing t-test and regression
toolks available in stats.py and Numeric)
--
Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)
More information about the Python-list
mailing list