[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Mon Dec 22 19:01:56 CET 2008

Thanks for all of the useful suggestions.  Here are some preliminary results.

With still gc.disable(), at the end of the program I first did a
gc.collect(), which took about five minutes.  (So, reason enough not
to gc.enable(), at least without Antoine's patch.)

After that, I did a .clear() on the huge dict.  That's where the time
is being spent.  Doing the suggested "poor man's profiling" (repeated
backtraces via gdb), for 20 or so samples, one is within libc free,
but all of the rest are in the same place (same source line) within
PyObjectFree (see below), sometimes within list_dealloc and sometimes
within tuple_dealloc.  So, apparently a lot of time is being spent in
this loop:

                        /* Case 3:  We have to move the arena towards the end
                         * of the list, because it has more free pools than
                         * the arena to its right.

                       ...

                        /* Locate the new insertion point by iterating over
                         * the list, using our nextarena pointer.
                         */
                        while (ao->nextarena != NULL &&
                                        nf > ao->nextarena->nfreepools) {
                                ao->prevarena = ao->nextarena;
                                ao->nextarena = ao->nextarena->nextarena;
                        }

Investigating further, from one stop, I used gdb to follow the chain
of pointers in the nextarena and prevarena directions.  There were
5449 and 112765 links, respectively.  maxarenas is 131072.

Sampling nf at different breaks gives values in the range(10,20).

This loop looks like an insertion sort.  If it's the case that only a
"few" iterations are ever needed for any given free, this might be
okay--if not, it would seem that this must be quadratic.

I attempted to look further by setting a silent break with counter
within the loop and another break after the loop to inspect the
counter, but gdb's been buzzing away on that for 40 minutes without
coming back.  That might mean that there are a lot of passes through
this loop per free (i.e., that gdb is taking a long time to process
100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't
handling this well.

In any case, this looks like the problem locus.

It's tempting to say "don't do this arena ordering optimization if
we're doing final cleanup", but really the program could have done
this .clear() at any point.  Maybe there needs to be a flag to disable
it altogether?  Or perhaps there's a smarter way to manage the list of
arena/free pool info.

Mike

Program received signal SIGINT, Interrupt.
0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at Objects/obmalloc.c:1064
1064				while (ao->nextarena != NULL &&
(gdb) bt
#0  0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at
Objects/obmalloc.c:1064
#1  0x0000000000433478 in list_dealloc (op=0x5ec043dd0) at
Objects/listobject.c:281
#2  0x000000000044075b in PyDict_Clear (op=0x74c7cd0) at
Objects/dictobject.c:757
#3  0x00000000004407b9 in dict_clear (mp=0x5ec043db0) at
Objects/dictobject.c:1776
#4  0x0000000000485905 in PyEval_EvalFrameEx (f=0x746ca50,
throwflag=<value optimized out>)
    at Python/ceval.c:3557
#5  0x000000000048725f in PyEval_EvalCodeEx (co=0x72643f0,
globals=<value optimized out>,
    locals=<value optimized out>, args=0x1, argcount=0, kws=0x72a5770,
kwcount=0, defs=0x743eba8,
    defcount=1, closure=0x0) at Python/ceval.c:2836
#6  0x00000000004855bc in PyEval_EvalFrameEx (f=0x72a55f0,
throwflag=<value optimized out>)
    at Python/ceval.c:3669
#7  0x000000000048725f in PyEval_EvalCodeEx (co=0x72644e0,
globals=<value optimized out>,
    locals=<value optimized out>, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, defcount=0,
    closure=0x0) at Python/ceval.c:2836
#8  0x00000000004872a2 in PyEval_EvalCode (co=0x5ec043db0,
globals=0x543e41f10, locals=0x543b969c0)
    at Python/ceval.c:494
#9  0x00000000004a844e in PyRun_FileExFlags (fp=0x7171010,
    filename=0x7ffffaf6b419
"/home/mkc/greylag/main/greylag_reannotate.py", start=<value optimized
out>,
    globals=0x7194510, locals=0x7194510, closeit=1,
flags=0x7ffffaf69080) at Python/pythonrun.c:1273
#10 0x00000000004a86e0 in PyRun_SimpleFileExFlags (fp=0x7171010,
    filename=0x7ffffaf6b419
"/home/mkc/greylag/main/greylag_reannotate.py", closeit=1,
    flags=0x7ffffaf69080) at Python/pythonrun.c:879
#11 0x0000000000412275 in Py_Main (argc=<value optimized out>,
argv=0x7ffffaf691a8) at Modules/main.c:523
#12 0x00000030fea1d8b4 in __libc_start_main () from /lib64/libc.so.6
#13 0x0000000000411799 in _start ()

On Sun, Dec 21, 2008 at 12:44 PM, Adam Olsen <rhamph at gmail.com> wrote:
> On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman <tutufan at gmail.com> wrote:
>> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti
>>> Have you seen any significant difference in the exit time when the
>>> cyclic GC is disabled or enabled?
>>
>> Unfortunately, with GC enabled, the application is too slow to be
>> useful, because of the greatly increased time for dict creation.  I
>> suppose it's theoretically possible that with this increased time, the
>> long time for exit will look less bad by comparison, but I'd be
>> surprised if it makes any difference at all.  I'm confident that there
>> are no loops in this dict, and nothing for cyclic gc to collect.
>
> Try putting an explicit gc.collect() at the end, with the usual
> timestamps before and after.
>
> After that try deleting your dict, then calling gc.collect(), with
> timestamps throughout.
>
>
> --
> Adam Olsen, aka Rhamphoryncus
>