
Guess which of these two statements is faster and by how much:
list(itertools.repeat(None, 5000)) list(tuple(itertools.repeat(None, 5000)))
I did guess, and was correct to 6 significant decimal digits <wink>.
The answer is important because it points the way to faster list comprehensions and other list building applications.
That Python's builtin list calls realloc() on every append is pretty gross, but it's also a tradeoff, saving the bytes in the list header that would be required to remember how many allocated-but-unused slots exist. If you've got a million tiny lists (and some apps do), burning an additional 8MB for something your app doesn't really need isn't attractive (on a 32-bit box today, the list header is 16 bytes; pymalloc aligns to 8-byte boundaries, so the next possible size is 24 bytes). list's very small overallocation (in the ballpark of 5%) is also aimed at minimizing memory burden. Speed isn't everything, tradeoffs are hard, and something loses when something else is favored. (BTW, I never would have considered implementing a resizable list type *without* keeping a field to record the amount of overallocation, so this isn't the tradeoff I would have made <wink>.)
Just in case nobody had thought of it before, couldn't the realloc() call be avoided when roundupsize(oldsize) == roundupsize(newsize)???
... And my interest is in a collections module which should probably published elsewhere (too much negativity on the subject here).
A collections *package* is a fine idea. I say "package" instead of "module" because there are literally hundreds of container types, and queues and bags are a tiny corner of that space. An example dear to my heart is BTrees (I maintain Zope's BTree implementation), which is a wonderful way to get an associative mapping sorted by key value. BTrees have a large API all by themselves, so it wouldn't make sense to try to squash them into a module along with 50 other container types.
Right. +1 for a package. --Guido van Rossum (home page: http://www.python.org/~guido/)