On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote:
On 2008-12-20 23:16, Martin v. Löwis wrote:
I will try next week to see if I can come up with a smaller, submittable example. Thanks.
These long exit times are usually caused by the garbage collection of objects. This can be a very time consuming task.
I doubt that. The long exit times are usually caused by a bad malloc implementation.
With "garbage collection" I meant the process of Py_DECREF'ing the objects in large containers or deeply nested structures, not the GC mechanism for breaking circular references in Python.
This will usually also involve free() calls, so the malloc implementation affects this as well. However, I've seen such long exit times on Linux and Windows, which both have rather good malloc implementations.
I don't think there's anything much we can do about it at the interpreter level. Deleting millions of objects takes time and that's not really surprising at all. It takes even longer if you have instances with .__del__() methods written in Python.
This behaviour appears to be specific to deleting dicts, not deleting random objects. I haven't yet confirmed that the problem still exists in trunk (I hope to have time tonight or tomorrow), but in my previous tests deleting millions of items stored in a list of tuples completed in a minute or two, while deleting the same items stored as key:item pairs in a dict took 30+ minutes. I say plus because I never had the patience to let it run to completion, it could have been hours for all I know.
Applications can choose other mechanisms for speeding up the exit process in various (less clean) ways, if they have a need for this.
BTW: Rather than using a huge in-memory dict, I'd suggest to either use an on-disk dictionary such as the ones found in mxBeeBase or a database.
The original poster's application uses 45GB of data. In my earlier tests, I've experienced the problem with ~ 300 *megabytes* of data: hardly what I would call "huge". -- Steven D'Aprano