O(n^2) is bad - can it be fixed?

Tim Peters tim.one at home.com
Fri May 25 05:15:55 EDT 2001


[Isaac To Kar Keung]
> ...
> I don't actually see a good solution, actually, short of storing the
> capacity somewhere.

We can do that if we use our own malloc, but won't otherwise (we're not going
to add more members to the Python listobject struct, because millions of
lists with a few things are about a million times more common than a few
lists with millions of things).

> The problem is that whatever the mapping, there is a possible case
> in which copying and reallocation is done basically everytime.

We could, e.g., always round the list size up to the next higher power of 2.
That way the capacity is implicit.  We're unlikely to, though.

> ...
> There are two possible attutides for this problem.  We can say that
> this is a flaw in the C library, and realloc should really deal with
> the problem because it happens so often.

Except that it rarely happens at all -- don't overblow this.  About once each
16 months *somebody* gets excited about this, then loses interest as they
discover it never hurts much except in the artificial cases they've
constructed to provoke it.

> Or we can say that this is a Python problem, that it should always
> make sure that it runs right in whatever C library.

The third possiblity is the one you're not ready for yet:  Python is neither
a C library nor an OS, neither is it obligated to provide theoretically
optimal performance in every conceivable case.  Python lists are wonderfully
zippy and compact as-is, and doing anything that hurts either of those would
be a net loss for the vast majority of Python users.  If building
mega-element lists via one append at a time is too slow for your app, fine,
don't do that.

> ...
> There are two places where we can add the capacity field.
> PyObject_VAR_HEAD is my choice.

As above, not a chance.

> ...
> But there is another solution that does not need the 4-bytes space
> overhead.  The C library should know the size of allocation
> anyway, so we can actually reuse that information.

Also not a chance, but it becomes possible if we enable Vladimir's PyMalloc
and fiddle it accordingly.





More information about the Python-list mailing list