
[Raymond Hettinger]
No, I meant that it is *much* faster to do an indexed write to a pre-allocated list than to use anything relying on PyList_Append() or ins1(). All of those individual calls to realloc are expensive. The 50 to 1 dilution of that expense is how it beats Knuth's two stack trick.
That's fine. As I said before, I've never had a use for a queue in Python where the two-stack trick (coded in Python, even) wasn't more than fast enough for me. If I did, there are a universe of possibilities, including Marc-Andre's mxQueue. That doesn't use Python lists; doesn't call realloc on every queue put (only when it's out of room, which is "never" when a queue reaches steady state); and, when it does realloc, overallocates by a much larger amount than Python's list type.
Guess which of these two statements is faster and by how much:
list(itertools.repeat(None, 5000)) list(tuple(itertools.repeat(None, 5000)))
I did guess, and was correct to 6 significant decimal digits <wink>.
The answer is important because it points the way to faster list comprehensions and other list building applications.
That Python's builtin list calls realloc() on every append is pretty gross, but it's also a tradeoff, saving the bytes in the list header that would be required to remember how many allocated-but-unused slots exist. If you've got a million tiny lists (and some apps do), burning an additional 8MB for something your app doesn't really need isn't attractive (on a 32-bit box today, the list header is 16 bytes; pymalloc aligns to 8-byte boundaries, so the next possible size is 24 bytes). list's very small overallocation (in the ballpark of 5%) is also aimed at minimizing memory burden. Speed isn't everything, tradeoffs are hard, and something loses when something else is favored. (BTW, I never would have considered implementing a resizable list type *without* keeping a field to record the amount of overallocation, so this isn't the tradeoff I would have made <wink>.)
... And my interest is in a collections module which should probably published elsewhere (too much negativity on the subject here).
A collections *package* is a fine idea. I say "package" instead of "module" because there are literally hundreds of container types, and queues and bags are a tiny corner of that space. An example dear to my heart is BTrees (I maintain Zope's BTree implementation), which is a wonderful way to get an associative mapping sorted by key value. BTrees have a large API all by themselves, so it wouldn't make sense to try to squash them into a module along with 50 other container types.
I am frankly surprised that so many here think that a collections module would be bad for the language.
I was more of the impression that bags didn't have fans <wink>.
Go figure. It's not like these are new, untried ideas -- the goal was supposed be improving productivity by providing higher level tools. A fast underlying implementation was just icing on the cake.
I'm strongly in favor of a collections package.