[pypy-issue] [issue1710] GC misbehaviour on large heaps

Carl Friedrich Bolz tracker at bugs.pypy.org
Tue Mar 18 10:21:31 CET 2014


New submission from Carl Friedrich Bolz <cfbolz at gmx.de>:

All info provided by Martin Koch <mak at issuu.com> on pypy-dev

-------
https://mail.python.org/pipermail/pypy-dev/2014-March/012305.html

I'm running a server (written in python, executed with pypy) that holds a large
graph (55GB, millions of nodes and edges) in memory and responds to queries by
traversing the graph.The graph is mutated a few times a second, and there are
hundreds of read-only requests a second.

My problem is that I no control over garbage collection. Thus, a major GC might
kick in while serving a query, and with this amount of data, the GC takes around
2 minutes. I have tried mitigating this by guessing when a GC might be due, and
proactively starting the garbage collector while not serving a request (this is
ok, as duplicate servers will respond to requests while this one is collecting). 

What I would really like is to be able to disable garbage collection for the old
generation. This is because the graph is fairly static, and I can live with
leaking memory from the relatively few and small mutations that occur. Any
queries are only likely to generate objects in the new generation, and it is
fine to collect these. Also, by design, the process is periodically restarted in
order to re-synchronize it with an authoritative source (thus rebuilding the
graph from scratch), so slight leakage is not an issue here. 

I have tried experimenting with setting environment variables as well as the
'gc' module, but nothing seems to give me what I want.

If disabling gc for certain generations is not possible, it would be nice to be
able to get a hint when a major collection is about to occur, so I can stop
serving requests.



-------
https://mail.python.org/pipermail/pypy-dev/2014-March/012310.html:

We have hacked up a small sample that seems to exhibit the same issue. 

We basically generate a linked list of objects. To increase connectedness,
elements in the list hold references (dummy_links) to 10 randomly chosen
previous elements in the list.

We then time a function that traverses 50000 elements from the list from a
random start point. If the traversal reaches the end of the list, we instead
traverse one of the dummy links. Thus, exactly 50K elements are traversed every
time. To generate some garbage, we build a list holding the traversed elements
and a dummy list of characters. 

Timings for the last 100 runs are stored in a circular buffer. If the elapsed
time for the last run is more than twice the average time, we print out a line
with the elapsed time, the threshold, and the 90% runtime (we would like to see
that the mean runtime does not increase with the number of elements in the list,
but that the max time does increase (linearly with the number of object, i
guess); traversing 50K elements should be independent of the memory size).

We have tried monitoring memory consumption by external inspection, but cannot
consistently verify that memory is deallocated at the same time that we see slow
requests. Perhaps the pypy runtime doesn't always return freed pages back to the OS?

Using top, we observe that 10M elements allocates around 17GB after building,
20M elements 26GB, 30M elements 28GB (and grows to 35GB shortly after building).

Here is output from a few runs with different number of elements:


pypy mem.py 10000000
start build
end build 84.142424
that took a long time elapsed: 13.230586  slow_threshold: 1.495401 
90th_quantile_runtime: 0.421558
that took a long time elapsed: 13.016531  slow_threshold: 1.488160 
90th_quantile_runtime: 0.423441
that took a long time elapsed: 13.032537  slow_threshold: 1.474563 
90th_quantile_runtime: 0.419817

pypy mem.py 20000000
start build
end build 180.823105
that took a long time elapsed: 27.346064  slow_threshold: 2.295146 
90th_quantile_runtime: 0.434726
that took a long time elapsed: 26.028852  slow_threshold: 2.283927 
90th_quantile_runtime: 0.374190
that took a long time elapsed: 25.432279  slow_threshold: 2.279631 
90th_quantile_runtime: 0.371502

pypy mem.py 30000000
start build
end build 276.217811
that took a long time elapsed: 40.993855  slow_threshold: 3.188464 
90th_quantile_runtime: 0.459891
that took a long time elapsed: 41.693553  slow_threshold: 3.183003 
90th_quantile_runtime: 0.393654
that took a long time elapsed: 39.679769  slow_threshold: 3.190782 
90th_quantile_runtime: 0.393677
that took a long time elapsed: 43.573411  slow_threshold: 3.239637 
90th_quantile_runtime: 0.393654

Code below
--------------------------------------------------------------
import time
from random import randint, choice
import sys


allElems = {}

class Node:
    def __init__(self, v_):
        self.v = v_
        self.next = None
        self.dummy_data = [randint(0,100)
                           for _ in xrange(randint(50,100))]
        allElems[self.v] = self
        if self.v > 0:
            self.dummy_links = [allElems[randint(0, self.v-1)] for _ in xrange(10)]
        else:
            self.dummy_links = [self]

    def set_next(self, l):
        self.next = l


def follow(node):
    acc = []
    count = 0
    cur = node
    assert node.v is not None
    assert cur is not None
    while count < 50000:
        # return a value; generate some garbage
        acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for x in
xrange(100)]))

        # if we have reached the end, chose a random link
        cur = choice(cur.dummy_links) if cur.next is None else cur.next
        count += 1

    return acc


def build(num_elems):
    start = time.time()
    print "start build"
    root = Node(0)
    cur = root
    for x in xrange(1, num_elems):
        e = Node(x)
        cur.next = e
        cur = e
    print "end build %f" % (time.time() - start)
    return root


num_timings = 100
if __name__ == "__main__":
    num_elems = int(sys.argv[1])
    build(num_elems)
    total = 0
    timings = [0.0] * num_timings # run times for the last num_timings runs
    i = 0
    beginning = time.time()
    while time.time() - beginning < 600:
        start = time.time()
        elem = allElems[randint(0, num_elems - 1)]
        assert(elem is not None)

        lst = follow(elem)

        total += choice(lst)[0] # use the return value for something

        end = time.time()

        elapsed = end-start
        timings[i % num_timings] = elapsed
        if (i > num_timings):
            slow_time = 2 * sum(timings)/num_timings # slow defined as > 2*avg
run time
            if (elapsed > slow_time):
                print "that took a long time elapsed: %f  slow_threshold: %f 
90th_quantile_runtime: %f" % \
                    (elapsed, slow_time, sorted(timings)[int(num_timings*.9)])
        i += 1
    print total

----------
messages: 6609
nosy: cfbolz, pypy-issue
priority: performance bug
status: unread
title: GC misbehaviour on large heaps

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1710>
________________________________________


More information about the pypy-issue mailing list