[Tutor] Question about O(N**2)
Steven D'Aprano
steve at pearwood.info
Sun May 4 06:13:12 CEST 2014
On Sat, May 03, 2014 at 03:59:40PM -0700, Danny Yoo wrote:
> Following up on this. Let's make sure that we're talking about the same thing.
>
>
> The assertion is that the following:
>
> fullPath += [...]
>
> where fullPath is a list of strings, exhibits O(n^2) time. I don't
> think this is true. Semantically, the statement above should be
> equivalent to:
>
> fullPath.append(...)
>
> and so I agree with David: this is not O(n^2), but rather O(n).
Danny is correct. For built-in lists, += augmented assignment is
implemented using the extend method, not the + operator. Consequently,
repeatedly calling += modifies the list in place, growing it as needed,
which is amortized O(N) rather than O(N**2).
Had += been implemented as + instead, as I initially mis-remembered, the
behaviour would have been O(N**2). The process would have gone:
fullPath = []
fullPath = [] + [a] # add a new path component
fullPath = [a] + [b]
fullPath = [a, b] + [c]
fullPath = [a, b, c] + [d]
fullPath = [a, b, c, d] + [e]
...
Each individual + is an O(N+M) operation (if there are ten items in the
first list, and five items in the second list, adding them needs to copy
fifteen items). With M a constant 1, we can take each addition as O(N).
And there are N additions, so we get O(N*N) or O(N**2).
We can test this by seeing how long it takes to perform a bunch of
additions. For timing, I'm going to use the recipe shown here:
http://code.activestate.com/recipes/577896-benchmark-code-with-the-with-statement/
First, using augmented assignment for lists:
py> size = 1000000
py> with Timer():
... data = []
... for i in range(size):
... data += [i]
...
time taken: 0.548546 seconds
py> with Timer():
... data = []
... for i in range(3*size):
... data += [i]
...
time taken: 1.672449 seconds
By increasing the amount of data by a factor of three, the time taken
also increases by about a factor of three. That is O(N).
Now let's try list addition. It's so much slower than += that I had to
reduce the size by a lot just to have it complete in any reasonable
amount of time:
py> size = 10000
py> with Timer():
... data = []
... for i in range(size):
... data = data + [i]
...
time taken: 0.331893 seconds
py> with Timer():
... data = []
... for i in range(3*size):
... data = data + [i]
...
time taken: 3.203508 seconds
Here, increasing N by a factor of 3 takes about 10 times longer, which
is consistent with O(N**2).
--
Steven
More information about the Tutor
mailing list