[Python-Dev] Optimization of the Year
tim.one at comcast.net
Tue Feb 10 11:04:02 EST 2004
> Hey, hey! After looking at this on and off for months, I finally
> found a safe way to dramatically improve the speed of building lists:
> The safe solution turned out to be saving a copy of the ob_item
> pointer whenever a block of known size was allocated. That copy is
> then used a guard for the size check. That way, if the pointer is
> swapped out by another routine (not a myth, it happens), we assume
> our saved block size is invalid and just do a normal realloc.
Share Guido's confusion/outrage about this: who does this? Would rather
shoot them than humor them.
> The results are wonderful:
They'll vary massively across platforms (because realloc() performance
varies massively across platforms -- we already saw that in the flurry of
timing reports following my proof-of-concept patch).
> AFAICT, this is a straight-win with no trade-offs along the way.
> The only offsetting cost is adding two fields to the list
> structure (Tim said that was bad but I don't remember why).
Guido explained it: we can add one 4-byte field to the list object header
on a 32-bit box for free now, but adding two grows every list object by 8
> Please take a look and see if there are any holes in the theory.
I can't look closely now, but I'll (re)note one thing: the roundupsize()
algorithm is so convoluted now because it has to arrange to return the same
result over long stretches of contiguous inputs, in order that *most*
realloc() calls end up being trivial now. But if we're only calling
realloc() when realloc() is actually needed, a much simpler approach is
extra_len = (requested_len >> 4) + 2;
allocated_len = requested_len + extra_len;
The "+2" is to ensure that extra_len > 0 even when requested_len < 16. This
is still very mild over-allocation, on the order of 6%. I'd be comfortable
knocking the shift of 4 down to 3, so long as PyList_New(size) continues not
to overallocate at all.
More information about the Python-Dev