O(n^2) is bad - can it be fixed?

Tim Peters tim.one at home.com
Tue May 22 06:11:36 EDT 2001


[Alex Martelli]
> ...
> import time, array
>
> def AppendTest(listorarray):
>     head = repr(listorarray)
>     start = time.clock()
>     for i in range(0, 100):
>         for x in range(0, 2000):
>             listorarray.append(' ')
>         # print i
>     stend = time.clock()
>     print "%s: %.2f" % (head, stend-start)
>
> AppendTest([])
> AppendTest(array.array('c'))
>
> D:\py21>python oaat.py
> []: 1.97
> array('c'): 1.55
>
> D:\py21>python oaat.py
> []: 1.97
> array('c'): 1.56
>
> So, an array.array of characters turns out to be
> systematically and repeatably 20% faster than a
> list for this task on my box (an old clunky P3-266
> NT4SP5 at this time:-).  Unsurprising:-) -- it
> has less overhead since it keeps 1 byte per char
> rather than 4 bytes per item (a reference) plus
> space for the char itself.

Curiously, an array.array is usually slower than a list, for most things.  It
has overhead of a different kind:  storing the raw bytes saves storage, but
it takes more layers of internal function calls to extract the raw bytes from
the Python object when storing them, and similarly to stuff the raw bytes
into a Python object again when retrieving them.  A list doesn't do any of
that, it just stores and retrieves "the object" (a pointer).

You catch a break in this particular test case because all one-character
literal strings are interned, and so objects are neither created nor
destroyed each time you pass ' '; in the general case, they are, as objects
are created just to pass to array.append(), then destroyed soon after the
array module extracts their guts.  list.append() avoids the "then destroyed"
half of that.

array.append() also needs to rededuce the *type* of the array each time it's
called, so it knows which routine to call to extract the raw bytes (there
aren't umpteen distinct array implementations under the covers, just one that
indirects thru a typecode member); and that's another overhead list.append()
avoids.

But an array.array('c') does win big when it comes time for .tostring(),
since that's just a straight memcpy of the internal array buffer; a
string.join(list, "") has mounds more work to do.

But on the fifth hand, note that array.append doesn't do any "over
allocation" on an append, while list.append does.  It's at least curious that
the often decried but rarely measured "quadratic time behavior" for
list.append doesn't show up more often for array.append, as arrays don't do
*anything* to mitigate it -- this really all depends on your system realloc()
behavior.

One reasonable conclusion to draw is that there are no generally applicable
conclusions <wink>.





More information about the Python-list mailing list