[Python-Dev] Optimization of the Year

Tim Peters tim.one at comcast.net
Tue Feb 10 11:04:02 EST 2004


[Raymond]
> Hey, hey!  After looking at this on and off for months, I finally
> found a safe way to dramatically improve the speed of building lists:
>
> 	http://users.rcn.com/python/download/list.diff

Cool!

> ...
> The safe solution turned out to be saving a copy of the ob_item
> pointer whenever a block of known size was allocated.  That copy is
> then used a guard for the size check.  That way, if the pointer is
> swapped out by another routine (not a myth, it happens), we assume
> our saved block size is invalid and just do a normal realloc.

Share Guido's confusion/outrage about this:  who does this?  Would rather
shoot them than humor them.

> The results are wonderful:

They'll vary massively across platforms (because realloc() performance
varies massively across platforms -- we already saw that in the flurry of
timing reports following my proof-of-concept patch).

> ...
> AFAICT, this is a straight-win with no trade-offs along the way.
> The only offsetting cost is adding two fields to the list
> structure (Tim said that was bad but I don't remember why).

Guido explained it:  we can add one 4-byte field to the list object header
on a 32-bit box for free now, but adding two grows every list object by 8
bytes.

> Please take a look and see if there are any holes in the theory.

I can't look closely now, but I'll (re)note one thing:  the roundupsize()
algorithm is so convoluted now because it has to arrange to return the same
result over long stretches of contiguous inputs, in order that *most*
realloc() calls end up being trivial now.  But if we're only calling
realloc() when realloc() is actually needed, a much simpler approach is
desirable; e.g.,

    extra_len = (requested_len >> 4) + 2;
    allocated_len = requested_len + extra_len;

The "+2" is to ensure that extra_len > 0 even when requested_len < 16.  This
is still very mild over-allocation, on the order of 6%.  I'd be comfortable
knocking the shift of 4 down to 3, so long as PyList_New(size) continues not
to overallocate at all.




More information about the Python-Dev mailing list