[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Nick Coghlan ncoghlan at gmail.com
Tue Dec 23 16:42:07 CET 2008


M.-A. Lemburg wrote:
> On 2008-12-22 22:45, Steven D'Aprano wrote:
>> This behaviour appears to be specific to deleting dicts, not deleting 
>> random objects. I haven't yet confirmed that the problem still exists 
>> in trunk (I hope to have time tonight or tomorrow), but in my previous 
>> tests deleting millions of items stored in a list of tuples completed 
>> in a minute or two, while deleting the same items stored as key:item 
>> pairs in a dict took 30+ minutes. I say plus because I never had the 
>> patience to let it run to completion, it could have been hours for all 
>> I know.
> 
> That's interesting. The dictionary dealloc routine doesn't give
> any hint as to why this should take longer than deallocating
> a list of tuples.

Shuffling the list with random.shuffle before deleting it makes a
*massive* difference to how long the deallocation takes.

Not only that, but after the shuffled list has been deallocated,
deleting an unshuffled list subsequently takes significantly longer.

(I posted numbers and a test script showing these effects elsewhere in
the thread).

The important factor seems to be deallocation order relative to
allocation order.

A simple list deletes objects in the reverse of the order of creation,
while a reversed list deletes them in order of creation. Both of these
seem to scale fairly linearly.

A dict with a hash order that I believe is a fair approximation of
creation order also didn't appear to exhibit particularly poor scaling
(at least not within the 20 million objects I could test).

The shuffled list, on the other hand, was pretty atrocious, taking
nearly twice as long to be destroyed as an unshuffled list of the same size.

I'd like to add another dict to the test which eliminates the current
coupling between hash order and creation order, and see if it exhibits
poor behaviour which is similar to that of the shuffled list, but I'm
not sure when I'll get to that (probably post-Christmas).

Note that I think these results are consistent with the theory that the
problem lies in the way partially allocated memory pools are tracked in
the obmalloc code - it makes sense that deallocating in creation order
or in reverse of creation order would tend to clean up each arena in
order and keep the obmalloc internal state neat and tidy, while
deallocating objects effectively at random would lead to a lot of
additional bookkeeping as the "most used" and "least used" arenas change
over the course of the deallocation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------


More information about the Python-Dev mailing list