RE: [Python-Dev] collections module

[Guido:]
My own ideal solution [for queues] would be a subtle hack to the list implementation to make pop(0) and insert(0, x) go a lot faster -- this can be done by adding one or two extra fields to the list head and allowing empty space at the front of the list as well as at the end.
[Tim:]
That Python's builtin list calls realloc() on every append is pretty gross, but it's also a tradeoff, saving the bytes in the list header that would be required to remember how many allocated-but-unused slots exist. If you've got a million tiny lists (and some apps do), burning an additional 8MB for something your app doesn't really need isn't attractive (on a 32-bit box today, the list header is 16 bytes; pymalloc aligns to 8-byte boundaries, so the next possible size is 24 bytes).
[Guido:]
Just in case nobody had thought of it before, couldn't the realloc() call be avoided when roundupsize(oldsize) == roundupsize(newsize)???
[Tim:]
I had thought of it before, but never acted on it <wink>. roundupsize() has gotten more expensive over the years too, and it may vary across boxes whether it's cheaper to call that a second time than it is to make the system realloc deduce that nothing needs to be done. roundupsize() is certainly cheaper than Microsoft's realloc() in the do-nothing case.
Note that there's an endcase glitch here when oldsize==0 and newsize==1. roundupsize() returns 8 for both, but it doesn't follow that ob_item isn't NULL. realloc(p, n) special-cases the snot out of p==NULL to avoid crashing then. [...] Note in passing that roundupsize() is expensive *because* lists don't remember the amount of overallocated space too: roundupsize() has to arrange to deliver the same result for many consecutive inputs
I'm sure this is just pointing out the obvious, but *IF* the list header were increased, we could add TWO fields... on for allocated_size, and another for left_side_slots. According to Tim, this would still "only" bump list objects up to 24 bytes, but would allow: [1] O(1) performance for pop(0) [2] No need to call realloc() on every append(), just on "true" resizes. I guess I'm proposing the following (all ideas previously mentioned): [1] pop(0) would increment ob_item and left_side_slots. [2] insert(0,x) would decrement ob_item and left_side_slots *IF* left_side_slots were > 0. Lists should still grow at the tail for efficiency, but we can re-use space at the head left by a pop(0). [3] append(0,x) (and other functions that insert) would compare ob_size and allocated_size (do we need left_side_slots in here too?) to see whether new space is needed at the end. If so, it would EITHER call realloc() for more space, OR it would memmove so left_side_slots became 0. I'm still not quite sure what rule to use to determine which to do (although for MOST lists, left_side_slots would be 0, so there wouldn't be a question). So, to recap advantages and disadvantages: lists would still grow just as they do now. Lists used as Queues by calling append() and pop(0) would be time efficient so long as we avoid calling memmove for each and every append() (that's why we don't want to ALWAYS memmove when left_side_slots > 0). Lists on which pop(0) was never called (that's most of 'em) would have NO degradation of space used, and IMPROVED speed in cases (like Windows) where a realloc() noop takes noticable time. And (this seems the most significant drawback) the size of a list would increase by 8 bytes, even for very short lists. I wouldn't want to make the change without trying out the performance in some "real applications", but it seems like there are some potentially useful ideas floating around here. -- Michael Chermside

[Michael Chermside]
... I'm sure this is just pointing out the obvious, but *IF* the list header were increased, we could add TWO fields... According to Tim, this would still "only" bump list objects up to 24 bytes, but ...
There's something else to note here, in connection with the realloc-avoiding list patch/hack I posted yesterday: for that logic to work, a non-empty list must always contain enough slots for roundupsize(ob->size) elements. So, for example, a list of length 1 must nevertheless contain room for 8 objects, which on a 32-bit machine is 7*4 = 28 bytes in the list *guts* it doesn't need today (but does need after that patch). So if people like that patch, it would be more *memory*-efficient too to drop the roundupsize() business and just expand the list header to remember how many slots have been allocated. Then no overallocation would be needed. Adding 8 bytes in the list header should be a net win over asking for up to 28 unused bytes in the guts (of small lists; it can be larger than that for large lists).
participants (2)
-
Michael Chermside
-
Tim Peters