Python 2.0b1 List comprehensions are slow
tim_one at email.msn.com
Wed Sep 13 06:12:12 CEST 2000
[posted & mailed]
[possibly Guido; way behind on email]
> Ways to optimize this (in 2.1; as Tim explained 2.0 is too close to
> release) might include:
> Well, it is your call, but the patch attached to this message speeds up
> list comprehensions by as much as 30% for me.
Let's wait for 2.1. The beta is already out, so it's too late to try new
approaches in core algorithms. I've coincidentally been looking at ways to
improve list.append for months in the background, but for a different reason
(horrid fragmentation under *some* flavors of Windows).
Since no code is using listcomps today, the case for mucking in this area to
speed *them* after the first beta is, frankly, non-existent. The speed of a
new feature simply isn't a bug. And, believe it or not, most Python users
don't care about 30% (if they did, they wouldn't be using Python <wink>).
> The patch works by reducing the number of calls to realloc() during
> repeated append() operations. The current code does preallocation
> of memory as the list grows (up to 10 items on small lists, and up
> to 100 on big lists), but an append operation always calls realloc
> even when the size of the list isn't going to be increased.
> (In essence, it is just an expensive no-op.)
It's a good argument but I know from experience that it simply doesn't
matter on all platforms (I've made this change to Python too in the past).
I've long been a fan of moving to Vladimir's PyMalloc, which is very
carefully tuned for *Python's* use of memory, and when we do that we'll have
a much better x-platform handle on how to exploit reallocs in Python.
> The patch makes the following three changes:
> - on a list size alteration, it will only call realloc() if the memory
> allocation is indeed going to be increased/decreased.
> - on a call to PyList_New() allocations are padded to the
> nearest 10 items (for small lists, 128 items on big lists). This
> is done so the above can work.
That's exactly the kind of tradeoff that would be evil to attempt between
betas: you speed up listcomps by 30% on your platform for you, while N
others who *had* perfectly happy programs slinging a million 1-element lists
see their memory use skyrocket "for no reason at all". We could risk that
in the first beta (although I doubt Guido would have gone for it even
then!), but not in the second.
> - I upped the padding size on big lists (> 500 items) to 128 from 100 -
> I believe this will help reduce memory fragmentation on big lists.
There's strong reason to believe that Yet Another cutoff point would be very
valuable on some platforms; the difference between 100 and 128 is
meaningless when lists grow (as they sometimes do!) to millions of elements,
and realloc on some platforms can have atrocious behavior then (e.g., on the
platform I'm using right now <wink>).
"The right thing" for extreme cases would be to add a member to list objects
to record the number of elements allocated as well as the number used. Then
mildly exponential growth could be assured regardless of size, and without
all this indirect brittle trickery. Guido doesn't want to boost the size of
the basic list object now, but the "extra memory" for this info is *already*
being consumed one way or another by the platform malloc, and if we move to
PyMalloc we can exploit that *it* has to know this info too.
> Patch follows, sorry if this isn't the right forum for sending
> patches. It is only 135 lines though..
Patches and bugs for Python are both done on SourceForge:
> #define ROUNDUP(n, PyTryBlock) \
> +#define roundupsize(n) (((n) < 500) ? ROUNDUP((n),10) : ROUNDUP((n),128))
Note that integer division is very expensive on *some* platforms; I had good
results in the past just changing (almost) this part alone to, e.g.,
#define roundupsize(n) ((n)<500 ? ((n)+7)&~7 : ((n)+127)&~127)
There are plenty of ideas for improvements here, and we'll love your help in
pursuing them -- but after 2.0, please.
y'rs - tim
More information about the Python-list