[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

"Martin v. Löwis" martin at v.loewis.de
Mon Dec 22 22:58:24 CET 2008


> Investigating further, from one stop, I used gdb to follow the chain
> of pointers in the nextarena and prevarena directions.  There were
> 5449 and 112765 links, respectively.  maxarenas is 131072.

To reduce the time for keeping sorted lists of arenas, I was first
thinking of a binheap. I had formulated it all, and don't want to
waste that effort, so I attach it below in case my second idea (right
below) is flawed.

It then occurred that there are only 64 different values for nfreepools,
as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping
the list sorted, I now propose to maintain 64 lists, accessible in
an array double-linked lists indexed by nfreepools. Whenever nfreepools
changes, the arena_object is unlinked from its current list,  and linked
into the new list. This should reduce the overhead for keeping the lists
sorted down from O(n) to O(1), with a moderate overhead of 64 pointers
(512 Bytes in your case).

Allocation of a new pool would have to do a linear search in these
pointers (finding the arena with the least number of pools); this
could be sped up with a finger pointing to the last index where
a pool was found (-1, since that pool will have moved).

Regards,
Martin

a) usable_arenas becomes an arena_object**, pointing to an array of
   maxarenas+1 arena*. A second variable max_usable_arenas is added.
   arena_object loses the prevarena pointer, and gains a usable_index
   value of type size_t (which is 0 for unused or completely allocated
   arena_objects).
   usable_arenas should stay heap-sorted, with the arena_object with
   the smallest nfreepools at index 1.

b) sink and swim operations are added, which keep usable_index intact
   whenever arena_object pointers get swapped.

c) whenever a pool is allocated in an arena, nfreepools decreases,
   and swim is called for the arena. whenever a pool becomes free,
   sink is called.

d) when the last pool was allocated in an arena, it is removed from
   the heap. likewise, when all pools are freed in an arena, it is
   removed from the heap and returned to the system.

e) when the first pool gets freed in an arena, it is added to the
   heap.

On each pool allocation/deallocation, this should get the O(n)
complexity of keeping the arena list sorted down to O(log n).


More information about the Python-Dev mailing list