[pypy-dev] Pypy garbage collection

Martin Koch mak at issuu.com
Fri Mar 14 16:19:56 CET 2014

We have hacked up a small sample that seems to exhibit the same issue.

We basically generate a linked list of objects. To increase connectedness,
elements in the list hold references (dummy_links) to 10 randomly chosen
previous elements in the list.

We then time a function that traverses 50000 elements from the list from a
random start point. If the traversal reaches the end of the list, we
instead traverse one of the dummy links. Thus, exactly 50K elements are
traversed every time. To generate some garbage, we build a list holding the
traversed elements and a dummy list of characters.

Timings for the last 100 runs are stored in a circular buffer. If the
elapsed time for the last run is more than twice the average time, we print
out a line with the elapsed time, the threshold, and the 90% runtime (we
would like to see that the mean runtime does not increase with the number
of elements in the list, but that the max time does increase (linearly with
the number of object, i guess); traversing 50K elements should be
independent of the memory size).

We have tried monitoring memory consumption by external inspection, but
cannot consistently verify that memory is deallocated at the same time that
we see slow requests. Perhaps the pypy runtime doesn't always return freed
pages back to the OS?

Using top, we observe that 10M elements allocates around 17GB after
building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB shortly
after building).

Here is output from a few runs with different number of elements:

*pypy mem.py 10000000*
start build
end build 84.142424
that took a long time elapsed: 13.230586  slow_threshold: 1.495401
 90th_quantile_runtime: 0.421558
that took a long time elapsed: 13.016531  slow_threshold: 1.488160
 90th_quantile_runtime: 0.423441
that took a long time elapsed: 13.032537  slow_threshold: 1.474563
 90th_quantile_runtime: 0.419817

*pypy mem.py 20000000*
start build
end build 180.823105
that took a long time elapsed: 27.346064  slow_threshold: 2.295146
 90th_quantile_runtime: 0.434726
that took a long time elapsed: 26.028852  slow_threshold: 2.283927
 90th_quantile_runtime: 0.374190
that took a long time elapsed: 25.432279  slow_threshold: 2.279631
 90th_quantile_runtime: 0.371502

*pypy mem.py 30000000*
start build
end build 276.217811
that took a long time elapsed: 40.993855  slow_threshold: 3.188464
 90th_quantile_runtime: 0.459891
that took a long time elapsed: 41.693553  slow_threshold: 3.183003
 90th_quantile_runtime: 0.393654
that took a long time elapsed: 39.679769  slow_threshold: 3.190782
 90th_quantile_runtime: 0.393677
that took a long time elapsed: 43.573411  slow_threshold: 3.239637
 90th_quantile_runtime: 0.393654

*Code below*
import time
from random import randint, choice
import sys

allElems = {}

class Node:
    def __init__(self, v_):
        self.v = v_
        self.next = None
        self.dummy_data = [randint(0,100)
                           for _ in xrange(randint(50,100))]
        allElems[self.v] = self
        if self.v > 0:
            self.dummy_links = [allElems[randint(0, self.v-1)] for _ in
            self.dummy_links = [self]

    def set_next(self, l):
        self.next = l

def follow(node):
    acc = []
    count = 0
    cur = node
    assert node.v is not None
    assert cur is not None
    while count < 50000:
        # return a value; generate some garbage
        acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for x in

        # if we have reached the end, chose a random link
        cur = choice(cur.dummy_links) if cur.next is None else cur.next
        count += 1

    return acc

def build(num_elems):
    start = time.time()
    print "start build"
    root = Node(0)
    cur = root
    for x in xrange(1, num_elems):
        e = Node(x)
        cur.next = e
        cur = e
    print "end build %f" % (time.time() - start)
    return root

num_timings = 100
if __name__ == "__main__":
    num_elems = int(sys.argv[1])
    total = 0
    timings = [0.0] * num_timings # run times for the last num_timings runs
    i = 0
    beginning = time.time()
    while time.time() - beginning < 600:
        start = time.time()
        elem = allElems[randint(0, num_elems - 1)]
        assert(elem is not None)

        lst = follow(elem)

        total += choice(lst)[0] # use the return value for something

        end = time.time()

        elapsed = end-start
        timings[i % num_timings] = elapsed
        if (i > num_timings):
            slow_time = 2 * sum(timings)/num_timings # slow defined as >
2*avg run time
            if (elapsed > slow_time):
                print "that took a long time elapsed: %f  slow_threshold:
%f  90th_quantile_runtime: %f" % \
                    (elapsed, slow_time,
        i += 1
    print total

On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski <fijall at gmail.com>wrote:

> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <mak at issuu.com> wrote:
> > Hi Armin, Maciej
> >
> > Thanks for responding.
> >
> > I'm in the process of trying to determine what (if any) of the code I'm
> in a
> > position to share, and I'll get back to you.
> >
> > Allowing hinting to the GC would be good. Even better would be a means to
> > allow me to (transparently) allocate objects in unmanaged memory, but I
> > would expect that to be a tall order :)
> >
> > Thanks,
> > /Martin
> Hi Martin.
> Note that in case you want us to do the work of isolating the problem,
> we do offer paid support to do that (then we can sign NDAs and stuff).
> Otherwise we would be more than happy to fix bugs once you isolate a
> part you can share freely :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20140314/d9ea27f6/attachment.html>

More information about the pypy-dev mailing list