It seems pretty clear now that the primary cause is keeping arenas sorted by number of free pools. As deallocation goes on, the number of distinct "# of free pools" values decreases, leaving large numbers of arenas sharing the same value. Then, e.g., if there are 10,000 arenas with 30 free pools remaining, and another pool is freed in one of them, its free pool count jumps to 31, and on average we have to crawl over 5,000 of its former peers to locate its new position in the list. Which is also consistent with what the OP saw, and I did too but in a much smaller case: the longer deallocation goes on, the slower it gets (fewer and fewer Nodes freed per unit time). Which suggests a different - albeit related - approach: it's not really the order of arenas that matters, but the partitioning of arenas by number of free pools. So instead of a singly doubly linked list of arenas, have a vector of doubly linked lists, one list per possible number of free pools. That's a fixed and relatively small number. For example, even if all arena bytes could be used for pools (they can't be - there's bookkeeping info at the start of an arena), the maximum number of free pools in an arena would be 2**18 / 2**12 == 2**6 == 64. When a pool is freed, no big deal: unlink the arena from its current list, and stick it in the list for the next higher number of free pools. This, of course, is constant-time. Of course there are edge cases (e.g., if the area is entirely free, it should be given back to C instead), but they seem pretty obvious, and I haven't thought of a real problem. Tedious, though ;-)