[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Mon Dec 22 21:22:56 CET 2008

On Mon, Dec 22, 2008 at 11:01 AM, Mike Coleman <tutufan at gmail.com> wrote:
> Thanks for all of the useful suggestions.  Here are some preliminary results.
>
> With still gc.disable(), at the end of the program I first did a
> gc.collect(), which took about five minutes.  (So, reason enough not
> to gc.enable(), at least without Antoine's patch.)
>
> After that, I did a .clear() on the huge dict.  That's where the time
> is being spent.  Doing the suggested "poor man's profiling" (repeated
> backtraces via gdb), for 20 or so samples, one is within libc free,
> but all of the rest are in the same place (same source line) within
> PyObjectFree (see below), sometimes within list_dealloc and sometimes
> within tuple_dealloc.  So, apparently a lot of time is being spent in
> this loop:
>
>
>                        /* Case 3:  We have to move the arena towards the end
>                         * of the list, because it has more free pools than
>                         * the arena to its right.
>
>                       ...
>
>                        /* Locate the new insertion point by iterating over
>                         * the list, using our nextarena pointer.
>                         */
>                        while (ao->nextarena != NULL &&
>                                        nf > ao->nextarena->nfreepools) {
>                                ao->prevarena = ao->nextarena;
>                                ao->nextarena = ao->nextarena->nextarena;
>                        }
>
> Investigating further, from one stop, I used gdb to follow the chain
> of pointers in the nextarena and prevarena directions.  There were
> 5449 and 112765 links, respectively.  maxarenas is 131072.
>
> Sampling nf at different breaks gives values in the range(10,20).
>
> This loop looks like an insertion sort.  If it's the case that only a
> "few" iterations are ever needed for any given free, this might be
> okay--if not, it would seem that this must be quadratic.
>
> I attempted to look further by setting a silent break with counter
> within the loop and another break after the loop to inspect the
> counter, but gdb's been buzzing away on that for 40 minutes without
> coming back.  That might mean that there are a lot of passes through
> this loop per free (i.e., that gdb is taking a long time to process
> 100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't
> handling this well.

To make sure that's the correct line please recompile python without
optimizations.  GCC happily reorders and merges different parts of a
function.

Adding a counter in C and recompiling would be a lot faster than using
a gdb hook.

-- 
Adam Olsen, aka Rhamphoryncus