Big-O notation (was: Re: Python and Schools)

Tue Apr 15 16:32:34 EDT 2003

Greg Brondo wrote:
...

>
> I understand why the code is bad (copying the string each time in
> memory as it grows).  However, I've never understood the notation's
> O(N) and O(N2)?

...

Here's a crack at explaining it:

    Characterisation of an algorithm's (as distinct from an
    implementation's) performance in relation to the size of the input
    set.  Normally of interest in determining the
    scalability/suitability of a particular algorithm.  The static (e.g.
    setup) and proportional (e.g. implementation) factors in the
    algorithm are factored out under the assumption that they are
    swamped by the relative effects of the size of the input set as that
    goes to infinity.

Common examples of big-O performance types:

    * O(1) -- time is linear, that is, it takes the same amount of time
      to do something regardless of how large the input set is.
    * O(log N) -- time grows as the log of N, (for those who don't
      remember high-school math, this means that it grows very slowly
      compared to N).
    * O(N) -- time grows linearly as the size of the set grows (e.g.
      process every item in the set in turn)
    * O(N log N) -- see O(N) and O(log N), basically, you do an amount
      of work tied to the size of the whole set for every stage in an O(
      log N ) operation (this is comparatively rarely discussed in my
      experience).
    * O(N**2) O(N**3) etceteras -- time grows as an exponent (e.g.
      2,3,3.4,10) of the size of the set, so you get geometric explosion
      of running time as set size increases.  These algorithms tend to
      fail on even small N values.

You can have all sorts of fun learning how to combine and re-factor the 
equations to get a final big-O notation, but for the most part, 
practical coders can get away with recognising and avoiding O(N**X) 
where X > 1 situations.  Recognising the underlying patterns in O(log N) 
(binary searches and the like), and O(1)-ish situations is useful as 
well, as they provide very good general scalability.  The calculus of 
big-O notations is just a convenient shorthand for people doing rigorous 
analysis AFAICS.

Keep in mind that the setup and constant factors of the algorithm are 
factored out of the equations.  Something which is O(1) may be slower 
than O(N**3) for all practical values of N if the setup time for the 
O(1) algorithm approaches the age of the universe.  Similarly, two 
algorithms which are O(N), but with vastly different constant factors 
(implementations) may have dramatically different running times.

HTH,
Mike

_______________________________________
  Mike C. Fletcher
  Designer, VR Plumber, Coder
  http://members.rogers.com/mcfletch/