O(n^2) is bad - can it be fixed?
Tim Peters
tim.one at home.com
Fri May 25 03:52:16 EDT 2001
[Bruce Dawson]
> After discoverting that lists actually are O(n^2),
"Actually" is an odd word for something that took two days to provoke after
giving up the first time then being assured it was possible to provoke it.
Is it "actually" the case on Linux too? As has been said, the behavior is
platform dependent.
> and after hearing a rumour that they behaved badly on Win9x, I
> decided to investigate.
>
> It turns out that my previous tests had stopped just before the
> quadratic nature of lists starts showing up. In my new test script
> (pasted in at the bottom of this message) I append two million
> items to a list.
Is this typical of a useful program you're going to run? Is so, you know
better now, at least on NT and Win9x. But if not, well, "who cares?" comes
to mind. It's not the only extreme thing you can do to drive Windows insane.
> On NT you can see the performance steadily degrade - each set of
> 10,000 items takes noticeably longer to add, however it does
> finish.
That was my experience two years ago too -- glad to hear NT has remained
stable <wink>.
> On Win9x the performance is constant, until a wall is hit and
> the performance plummets.
Ditto.
> A bit later, on Win9x, with about 1.5 million items in the list,
> the program halts with an out of memory error. This happens
> despite the fact that Python only has a few tens of Mbytes
> allocated on my 128 Mbyte machine. It turns out that Python
> actually runs out of address space! 2 GBytes of address space
> swallowed up in a few minutes.
Yup. It's like trying to avert your eyes from a train wreck!
> ...
> Worse yet, the old heaps don't get freed. There memory mostly
> does, but the address space doesn't. I don't know if that's a Python
> glitch or a C run-time glitch,
It's hard to say whether it's a C library problem or (more likely, I think) a
deeper OS VM mgmt problem. Python dutifully returns the memory via free(),
but Win9x's VM defragmenter is apparently overwhelmed.
> but pretty soon the address space is filled with about 500 4 Mbyte
> heaps, and the script halts. The line of increasing large heaps
> that are all mostly empty is quite amusing in my little memory
> viewer.
>
> NT handles it better, but the performance still degrades.
> ...
> Instead of this:
>
> static int
> roundupsize(int n)
> {
> if (n < 500)
> return ROUNDUP(n, 10);
> else
> return ROUNDUP(n, 100);
> }
>
> why not this:
>
> static int
> roundupsize(int n)
> {
> if (n < 500)
> return ROUNDUP(n, 10);
> else
> return n + n / 10;
> }
>
> This method will avoid the quadratic behaviour, it will avoid
> running out of address space on systems with imperfect
> allocators,
How do you know that? You didn't actually try it -- if you had, you would
have seen it didn't work as you hoped. Hint: as lists grow larger, you're
now asking the system for strictly more space on *every* append -- the cure
is worse than the disease. You can repair this with enough effort, but
please *try* it before making claims; also make sure it doesn't hurt
performance on Linux and umpteen minority OSes. As an example of something
"that works even on Windows" but hurts too much elsewhere, you *could* try
this:
static int
roundupsize(int n)
{
int i;
if (n < 500)
return ROUNDUP(n, 10);
i = 0;
while (n) {
++i;
n >>= 1;
}
return 3 << (i - 1);
}
> and it will waste, at most, 10% of memory. If 10% is too much to
> waste, change the /10 to /20.
>
> The current system can run a hundred times slower than it
> needs to, and can fail when it needn't. Is there any good
> reason not to make the fix?
You have to find a fix that actually does more good than harm first, it's not
a problem in practice, and a better long-term solution lies in adopting
Vladimir Marangozov's PyMalloc (Win9x malloc is too poor in general to be
worth the effort of worming around in one endcase that doesn't happen in
practice anyway -- but it's worth replacing it for other reasons).
> Any program that adds more than 1000 items to a list in small batches
> will benefit from this fix, and there is virtually no downside.
mercifully y'rs - tim
More information about the Python-list
mailing list