O(n^2) is bad - can it be fixed?

Alex Martelli aleaxit at yahoo.com
Tue May 22 04:45:53 EDT 2001


"E. Mark Ping" <emarkp at CSUA.Berkeley.EDU> wrote in message
news:9ed20b$jfj$1 at agate.berkeley.edu...
> In article <3B09FBF9.8691EBFE at accessone.com>,
> Helen Dawson  <helend at accessone.com> wrote:
> >List does much better. Why? It probably does the same thing that the
> >STL vector typically does - the allocabe space doubles whenever it
> >runs out of room, and then the newly allocated space is filled in
> >as needed.
>
> Doubtful.  It most likely does what std::list<> does--it allocates

No need to wonder in a void about such issues... listobject.c is
open for inspection in the Objects directory of any Python source
distribution!  A python list is nothing like a C++ std::list --
the latter is a doubly-linked-list (fast insertion/removal at
any point, but O(N) to get the Nth item), the former is actually
an array or vector.

> space each time a new entry is added.  That's linear, not n^2 time.
> std::vector<> uses geometric allocation to get amortized constant
> allocation time.
>
> If you need to add on character at a time, choose the data structure
> that does what you want.  Perhaps a list of char?

Testing is better than wondering:

import time, array

def AppendTest(listorarray):
    head = repr(listorarray)
    start = time.clock()
    for i in range(0, 100):
        for x in range(0, 2000):
            listorarray.append(' ')
        # print i
    stend = time.clock()
    print "%s: %.2f" % (head, stend-start)

AppendTest([])
AppendTest(array.array('c'))

D:\py21>python oaat.py
[]: 1.97
array('c'): 1.55

D:\py21>python oaat.py
[]: 1.97
array('c'): 1.56

So, an array.array of characters turns out to be
systematically and repeatably 20% faster than a
list for this task on my box (an old clunky P3-266
NT4SP5 at this time:-).  Unsurprising:-) -- it
has less overhead since it keeps 1 byte per char
rather than 4 bytes per item (a reference) plus
space for the char itself.

By the way, once your built-one-char-at-a-time
array.array('c') object is ready, just call its
method .tostring() to get the string equivalent
(if you choose to use a list instead, then the
best way to get the string is ''.join(thelist)).

The dominance of array.array for this shines
out clearly from Guido's essay/anecdote on
optimization, btw...


Alex






More information about the Python-list mailing list