[Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2)

Mike Coleman tutufan at gmail.com
Tue Dec 23 01:19:10 CET 2008


On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen <rhamph at gmail.com> wrote:
> To make sure that's the correct line please recompile python without
> optimizations.  GCC happily reorders and merges different parts of a
> function.
>
> Adding a counter in C and recompiling would be a lot faster than using
> a gdb hook.

Okay, I did this.  The results are the same, except that now sampling
selects the different source statements within this loop, instead of
just the top of the loop (which makes sense).

I added a counter (static volatile long) as suggested, and a
breakpoint to sample it.  Not every pass through PyObject_Free takes
case 3, but for those that do, this loop runs around 100-25000 times.
I didn't try to graph it, but based on a quick sample, it looks like
more than 5000 iterations on most occasions.

The total counter is 12.4 billion at the moment, and still growing.
That seems high, but I'm not sure what would be expected or hoped for.

I have a script that demonstrates the problem, but unfortunately the
behavior isn't clearly bad until large amounts of memory are used.  I
don't think it shows at 2G, for example.  (A 32G machine is
sufficient.)  Here is a log of running the program at different sizes
($1):

1 4.04686999321 0.696660041809
2 8.1575551033 1.46393489838
3 12.6426320076 2.30558800697
4 16.471298933 3.80377006531
5 20.1461620331 4.96685886383
6 25.150053978 5.48230814934
7 28.9099609852 7.41244196892
8 32.283219099 6.31711483002
9 36.6974511147 7.40236377716
10 40.3126089573 9.01174497604
20 81.7559120655 20.3317198753
30 123.67071104 31.4815018177
40 161.935647011 61.4484620094
50 210.610441923 88.6161060333
60 248.89805007 118.821491003
70 288.944771051 194.166989088
80 329.93295002 262.14109993
90 396.209988832 454.317914009
100 435.610564947 564.191882133

If you plot this, it is clearly quadratic (or worse).

Here is the script:

#!/usr/bin/env python


"""
Try to trigger quadratic (?) behavior during .clear() of a large but simple
defaultdict.

"""


from collections import defaultdict
import time
import sys

import gc; gc.disable()


print >> sys.stderr, sys.version

h = defaultdict(list)

n = 0

lasttime = time.time()


megs = int(sys.argv[1])

print megs,
sys.stdout.flush()

# 100M iterations -> ~24GB? on my 64-bit host

for i in xrange(megs * 1024 * 1024):
    s = '%0.7d' % i
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
    h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345))
#   if (i % 1000000) == 0:
#       t = time.time()
#       print >> sys.stderr, t-lasttime
#       lasttime = t

t = time.time()
print t-lasttime,
sys.stdout.flush()
lasttime = t

h.clear()

t = time.time()
print t-lasttime,
sys.stdout.flush()
lasttime = t

print


More information about the Python-Dev mailing list