[Python-Dev] Optimization of the Year
Guido van Rossum
guido at python.org
Tue Feb 10 10:40:28 EST 2004
> Hey, hey! After looking at this on and off for months, I finally
> found a safe way to dramatically improve the speed of building lists:
>
> http://users.rcn.com/python/download/list.diff
Very cool, but...
> We've always known that every list.append() resulted in a realloc()
> call after computing an adaptive, oversized block length.
>
> The obvious solution was to save the block size and avoid calls to
> realloc() when the block was already large enough to hold a new
> element. Unfortunately, the list structure was wide open to
> manipulation by every piece of code written since Python's infancy.
> Almost no assumptions could be made about other code directly
> hacking the list structure elements.
Disagree. The public API documentation has never permitted modifying
ob_item and ob_size directly. It would be breaking the abstraction.
> The safe solution turned out to be saving a copy of the ob_item pointer
> whenever a block of known size was allocated. That copy is then
> used a guard for the size check. That way, if the pointer is swapped
> out by another routine (not a myth, it happens), we assume our saved
> block size is invalid and just do a normal realloc.
"It happens." Where?
If in the core, that code should simply be fixed.
If in 3rd party code, that code is simply wrong.
If indeed such 3rd party code exists, and we expect we can't get it
all fixed before 2.4 is released, the tracked_item hack can be used as
a temporary measure to hunt down all those 3rd party extensions that
break the abstraction. I propose to issue a warning when it is
discovered that ob_item != tracked_item. Then in 2.5 we can remove
the tracked_item feature.
Keeping tracked_item permanently as part of the API would be a
nightmare. How would you document it? "You can make ob_item point to
another malloc()ed block, as long as you update ob_size to reflect the
size of the block, and as long as you free() the old block, and oh, by
the way, you may also use realloc(), but if realloc() just happens to
extend or truncate the block without moving it, your code may die in
unexpected ways.
> The results are wonderful:
>
> * list(it) speeds up by at least 40% when "it" is an iterable
> not defining __len__ (there is no change for iterables that
> do define __len__ because they pre-size in a single step).
>
> . list comprehensions show a similar speed-up
>
> . list.append(x) is about twice as fast
No doubt about it! (I think Tim's earlier attempt had about the same
performance.)
> AFAICT, this is a straight-win with no trade-offs along the way.
> The only offsetting cost is adding two fields to the list
> structure (Tim said that was bad but I don't remember why).
obmalloc rounds small block sizes up to multiples of 8. A list object
needs 3 32-bit words for the GC struct, then 1 for the refcount, 1 for
the type pointer, 1 for the size, and 1 for ob_item. That's 28
bytes. So we can add one word for free -- but a second one bumps the
allocated size from 32 to 40.
> Please take a look and see if there are any holes in the theory.
>
> While the patch passes the test suite, it is possible that some
> extension module reallocs() the list to a smaller size (leaving
> the pointer unchanged) and then changes its mind and starts growing
> the list again. Given the magnitude of the speed-up, I think that
> risk is warranted. The offending code would have to be by-passing
> the C API for lists and tinkering with the internals in atypical
> way (swapping one pointer for another and/or altering ob_size is
> typical; downward resizing the existing pointer is not).
We can't be wishy-washy about this. Either we allow external code to
tinker, or we don't. If, in fact, there is 3rd party code that
tinkers in illegal ways, we should try to get it fixed, not work
around it forever.
> P.S. SF is down for maintenance. Will post the patch there tomorrow.
> While the change is simple, the patch is long because all the NRESIZE
> macros had to be replaced by a function call.
I just got email from SF announcing their "premium" (for-fee) service.
Sigh. How long before we will have to pay for everything except
read-only one-day-out-of-day CVS access?
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list