[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Nick Coghlan ncoghlan at gmail.com
Tue Dec 23 04:25:47 CET 2008


Mike Coleman wrote:
> If you plot this, it is clearly quadratic (or worse).

Here's another comparison script that tries to probe the vagaries of the
obmalloc implementation. It looks at the proportional increases in
deallocation times for lists and dicts as the number of contained items
increases when using a variety of deallocation orders:
- in hash order (dict)
- in reverse order of allocation (list)
- in order of allocation (list, reversed in place)
- in random order (list, shuffled in place using the random module)

I've included the final output from a run on my own machine below [1],
but here are the main points I get out of it:
- at the sizes I can test (up to 20 million items in the containers),
this version of the script doesn't show any particularly horrible
non-linearity with deallocation of dicts, lists or reversed lists.
- when the items in a list are deallocated in *random* order, however,
the deallocation times are highly non-linear - by the time we get to 20
million items, deallocating in random order takes nearly twice as long
as deallocation in either order of allocation or in reverse order.
- after the list of items had been deallocated in random order,
subsequent deallocation of a dict and the list took significantly longer
than when those operations took place on a comparatively "clean"
obmalloc state.

I'm going to try making a new version of the script that uses random
integers with a consistent number of digits in place of the monotically
increasing values that are currently used and see what effect that has
on the dict scaling (that's where I expect to see the greatest effect,
since the hash ordering is the one which will be most affected by the
change to the item contents).

Cheers,
Nick.

[1] Full final results from local test run:

Dict: (Baseline=0.003135 seconds)
  100000=100.0%
  1000000=1020.9%
  2000000=2030.5%
  5000000=5026.7%
  10000000=10039.7%
  20000000=20086.4%
List: (Baseline=0.005764 seconds)
  100000=100.0%
  1000000=1043.7%
  2000000=2090.1%
  5000000=5227.2%
  10000000=10458.1%
  20000000=20942.7%
ReversedList: (Baseline=0.005879 seconds)
  100000=100.0%
  1000000=1015.0%
  2000000=2023.5%
  5000000=5057.1%
  10000000=10114.0%
  20000000=20592.6%
ShuffledList: (Baseline=0.028241 seconds)
  100000=100.0%
  1000000=1296.0%
  2000000=2877.3%
  5000000=7960.1%
  10000000=17216.9%
  20000000=37599.9%
PostShuffleDict: (Baseline=0.016229 seconds)
  100000=100.0%
  1000000=1007.9%
  2000000=2018.4%
  5000000=5075.3%
  10000000=10217.5%
  20000000=20873.1%
PostShuffleList: (Baseline=0.020551 seconds)
  100000=100.0%
  1000000=1021.9%
  2000000=1978.2%
  5000000=4953.6%
  10000000=10262.3%
  20000000=19854.0%

Baseline changes for Dict and List after deallocation of list in random
order:
  Dict: 517.7%
  List: 356.5%

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dealloc_timing.py
Type: text/x-python
Size: 2003 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20081223/4abc8b9b/attachment-0001.py>


More information about the Python-Dev mailing list