[Python-Dev] iterzip()
Tim Peters
tim.one@comcast.net
Mon, 29 Apr 2002 15:35:37 -0400
[Guido]
> This suggests that zip() can be sped up considerably by using a hint
> for the input sequence sizes. If they all respond properly to
> PySequence_Length(), zip can preallocate the result list to the
> correct size. It shouldn't believe the numbers fully, but using them
> as a hint for initial allocation makes a lot of sense, and should
> greatly reduce the inefficiency.
I'm afraid I misanalyzed the cause. Giving zip a good guess about the
result size and using PyList_SET_ITEM up until then leads (as expected) to
no list copying in the test case I posted, but only dropped the runtime from
7.87 to 7.31 seconds (which makes sense looking at the numbers more closely:
it doesn't take *that* long to copy a few dozen mebabytes of memory unless
you're doing it by hand <wink>).
Check this out:
"""
from time import clock as now
n = 1000000
def juststore(x):
for i in x: x[i] = 0
def justtups(x):
for i in x: 0,
def storetups(x):
for i in x: x[i] = 0,
def run(func):
x = range(n)
start = now()
func(x)
finish = now()
print "%10s %6.2f" % (func.__name__, finish - start)
run(juststore)
run(justtups)
run(storetups)
"""
I get:
juststore 0.93
justtups 0.58
storetups 7.61
list.append is out of the picture here. Creating a million tuples goes fast
so long as they're recycled, and storing a million things goes fast, but
storing a million distinct tuples takes very much longer. It's a Mystery to
me so far. How does it act on Linux?