[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Steven D'Aprano steve at pearwood.info
Mon Dec 22 22:45:42 CET 2008


On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
> On 2008-12-20 23:16, Martin v. Löwis wrote:
> >>> I will try next week to see if I can come up with a smaller,
> >>> submittable example.  Thanks.
> >>
> >> These long exit times are usually caused by the garbage collection
> >> of objects. This can be a very time consuming task.
> >
> > I doubt that. The long exit times are usually caused by a bad
> > malloc implementation.
>
> With "garbage collection" I meant the process of Py_DECREF'ing the
> objects in large containers or deeply nested structures, not the GC
> mechanism for breaking circular references in Python.
>
> This will usually also involve free() calls, so the malloc
> implementation affects this as well. However, I've seen such long
> exit times on Linux and Windows, which both have rather good
> malloc implementations.
>
> I don't think there's anything much we can do about it at the
> interpreter level. Deleting millions of objects takes time and that's
> not really surprising at all. It takes even longer if you have
> instances with .__del__() methods written in Python.


This behaviour appears to be specific to deleting dicts, not deleting 
random objects. I haven't yet confirmed that the problem still exists 
in trunk (I hope to have time tonight or tomorrow), but in my previous 
tests deleting millions of items stored in a list of tuples completed 
in a minute or two, while deleting the same items stored as key:item 
pairs in a dict took 30+ minutes. I say plus because I never had the 
patience to let it run to completion, it could have been hours for all 
I know.

> Applications can choose other mechanisms for speeding up the
> exit process in various (less clean) ways, if they have a need for
> this.
>
> BTW: Rather than using a huge in-memory dict, I'd suggest to either
> use an on-disk dictionary such as the ones found in mxBeeBase or
> a database.

The original poster's application uses 45GB of data. In my earlier 
tests, I've experienced the problem with ~ 300 *megabytes* of data: 
hardly what I would call "huge".



-- 
Steven D'Aprano


More information about the Python-Dev mailing list