[pypy-dev] Pypy garbage collection

Mon Mar 17 12:09:09 CET 2014

The number of lines is nonsense. This is a timestamp in hex.

On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <mak at issuu.com> wrote:
> Based On Maciej's suggestion, I tried the following
>
> PYPYLOG=- pypy mem.py 10000000 > out
>
> This generates a logfile which looks something like this
>
> start-->
> [2b99f1981b527e] {gc-minor
> [2b99f1981ba680] {gc-minor-walkroots
> [2b99f1981c2e02] gc-minor-walkroots}
> [2b99f19890d750] gc-minor}
> [snip]
> ...
> <--stop
>
>
> It turns out that the culprit is a lot of MINOR collections.
>
> I base this on the following observations:
>
> I can't understand the format of the timestamp on each logline (the
> "[2b99f1981b527e]"). From what I can see in the code, this should be output
> from time.clock(), but that doesn't return a number like that when I run
> pypy interactively
> Instead, I count the number of debug lines between start--> and the
> corresponding <--stop.
> Most runs have a few hundred lines of output between start/stop
> All slow runs have very close to 57800 lines out output between start/stop
> One such sample does 9609 gc-collect-step operations, 9647 gc-minor
> operations, and 9647 gc-minor-walkroots operations.
>
>
> Thanks,
> /Martin
>
>
> On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski <fijall at gmail.com>
> wrote:
>>
>> there is an environment variable PYPYLOG=gc:- (where - is stdout)
>> which will do that for you btw.
>>
>> maybe you can find out what's that using profiling or valgrind?
>>
>> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <mak at issuu.com> wrote:
>> > I have tried getting the pypy source and building my own version of
>> > pypy. I
>> > have modified rpython/memory/gc/incminimark.py:major_collection_step()
>> > to
>> > print out when it starts and when it stops. Apparently, the slow queries
>> > do
>> > NOT occur during major_collection_step; at least, I have not observed
>> > major
>> > step output during a query execution. So, apparently, something else is
>> > blocking. This could be another aspect of the GC, but it could also be
>> > anything else.
>> >
>> > Just to be sure, I have tried running the same application in python
>> > with
>> > garbage collection disabled. I don't see the problem there, so it is
>> > somehow
>> > related to either GC or the runtime somehow.
>> >
>> > Cheers,
>> > /Martin
>> >
>> >
>> > On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <mak at issuu.com> wrote:
>> >>
>> >> We have hacked up a small sample that seems to exhibit the same issue.
>> >>
>> >> We basically generate a linked list of objects. To increase
>> >> connectedness,
>> >> elements in the list hold references (dummy_links) to 10 randomly
>> >> chosen
>> >> previous elements in the list.
>> >>
>> >> We then time a function that traverses 50000 elements from the list
>> >> from a
>> >> random start point. If the traversal reaches the end of the list, we
>> >> instead
>> >> traverse one of the dummy links. Thus, exactly 50K elements are
>> >> traversed
>> >> every time. To generate some garbage, we build a list holding the
>> >> traversed
>> >> elements and a dummy list of characters.
>> >>
>> >> Timings for the last 100 runs are stored in a circular buffer. If the
>> >> elapsed time for the last run is more than twice the average time, we
>> >> print
>> >> out a line with the elapsed time, the threshold, and the 90% runtime
>> >> (we
>> >> would like to see that the mean runtime does not increase with the
>> >> number of
>> >> elements in the list, but that the max time does increase (linearly
>> >> with the
>> >> number of object, i guess); traversing 50K elements should be
>> >> independent of
>> >> the memory size).
>> >>
>> >> We have tried monitoring memory consumption by external inspection, but
>> >> cannot consistently verify that memory is deallocated at the same time
>> >> that
>> >> we see slow requests. Perhaps the pypy runtime doesn't always return
>> >> freed
>> >> pages back to the OS?
>> >>
>> >> Using top, we observe that 10M elements allocates around 17GB after
>> >> building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB
>> >> shortly
>> >> after building).
>> >>
>> >> Here is output from a few runs with different number of elements:
>> >>
>> >>
>> >> pypy mem.py 10000000
>> >> start build
>> >> end build 84.142424
>> >> that took a long time elapsed: 13.230586  slow_threshold: 1.495401
>> >> 90th_quantile_runtime: 0.421558
>> >> that took a long time elapsed: 13.016531  slow_threshold: 1.488160
>> >> 90th_quantile_runtime: 0.423441
>> >> that took a long time elapsed: 13.032537  slow_threshold: 1.474563
>> >> 90th_quantile_runtime: 0.419817
>> >>
>> >> pypy mem.py 20000000
>> >> start build
>> >> end build 180.823105
>> >> that took a long time elapsed: 27.346064  slow_threshold: 2.295146
>> >> 90th_quantile_runtime: 0.434726
>> >> that took a long time elapsed: 26.028852  slow_threshold: 2.283927
>> >> 90th_quantile_runtime: 0.374190
>> >> that took a long time elapsed: 25.432279  slow_threshold: 2.279631
>> >> 90th_quantile_runtime: 0.371502
>> >>
>> >> pypy mem.py 30000000
>> >> start build
>> >> end build 276.217811
>> >> that took a long time elapsed: 40.993855  slow_threshold: 3.188464
>> >> 90th_quantile_runtime: 0.459891
>> >> that took a long time elapsed: 41.693553  slow_threshold: 3.183003
>> >> 90th_quantile_runtime: 0.393654
>> >> that took a long time elapsed: 39.679769  slow_threshold: 3.190782
>> >> 90th_quantile_runtime: 0.393677
>> >> that took a long time elapsed: 43.573411  slow_threshold: 3.239637
>> >> 90th_quantile_runtime: 0.393654
>> >>
>> >> Code below
>> >> --------------------------------------------------------------
>> >> import time
>> >> from random import randint, choice
>> >> import sys
>> >>
>> >>
>> >> allElems = {}
>> >>
>> >> class Node:
>> >>     def __init__(self, v_):
>> >>         self.v = v_
>> >>         self.next = None
>> >>         self.dummy_data = [randint(0,100)
>> >>                            for _ in xrange(randint(50,100))]
>> >>         allElems[self.v] = self
>> >>         if self.v > 0:
>> >>             self.dummy_links = [allElems[randint(0, self.v-1)] for _ in
>> >> xrange(10)]
>> >>         else:
>> >>             self.dummy_links = [self]
>> >>
>> >>     def set_next(self, l):
>> >>         self.next = l
>> >>
>> >>
>> >> def follow(node):
>> >>     acc = []
>> >>     count = 0
>> >>     cur = node
>> >>     assert node.v is not None
>> >>     assert cur is not None
>> >>     while count < 50000:
>> >>         # return a value; generate some garbage
>> >>         acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for x
>> >> in
>> >> xrange(100)]))
>> >>
>> >>         # if we have reached the end, chose a random link
>> >>         cur = choice(cur.dummy_links) if cur.next is None else cur.next
>> >>         count += 1
>> >>
>> >>     return acc
>> >>
>> >>
>> >> def build(num_elems):
>> >>     start = time.time()
>> >>     print "start build"
>> >>     root = Node(0)
>> >>     cur = root
>> >>     for x in xrange(1, num_elems):
>> >>         e = Node(x)
>> >>         cur.next = e
>> >>         cur = e
>> >>     print "end build %f" % (time.time() - start)
>> >>     return root
>> >>
>> >>
>> >> num_timings = 100
>> >> if __name__ == "__main__":
>> >>     num_elems = int(sys.argv[1])
>> >>     build(num_elems)
>> >>     total = 0
>> >>     timings = [0.0] * num_timings # run times for the last num_timings
>> >> runs
>> >>     i = 0
>> >>     beginning = time.time()
>> >>     while time.time() - beginning < 600:
>> >>         start = time.time()
>> >>         elem = allElems[randint(0, num_elems - 1)]
>> >>         assert(elem is not None)
>> >>
>> >>         lst = follow(elem)
>> >>
>> >>         total += choice(lst)[0] # use the return value for something
>> >>
>> >>         end = time.time()
>> >>
>> >>         elapsed = end-start
>> >>         timings[i % num_timings] = elapsed
>> >>         if (i > num_timings):
>> >>             slow_time = 2 * sum(timings)/num_timings # slow defined as
>> >> >
>> >> 2*avg run time
>> >>             if (elapsed > slow_time):
>> >>                 print "that took a long time elapsed: %f
>> >> slow_threshold:
>> >> %f  90th_quantile_runtime: %f" % \
>> >>                     (elapsed, slow_time,
>> >> sorted(timings)[int(num_timings*.9)])
>> >>         i += 1
>> >>     print total
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski <fijall at gmail.com>
>> >> wrote:
>> >>>
>> >>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <mak at issuu.com> wrote:
>> >>> > Hi Armin, Maciej
>> >>> >
>> >>> > Thanks for responding.
>> >>> >
>> >>> > I'm in the process of trying to determine what (if any) of the code
>> >>> > I'm
>> >>> > in a
>> >>> > position to share, and I'll get back to you.
>> >>> >
>> >>> > Allowing hinting to the GC would be good. Even better would be a
>> >>> > means
>> >>> > to
>> >>> > allow me to (transparently) allocate objects in unmanaged memory,
>> >>> > but I
>> >>> > would expect that to be a tall order :)
>> >>> >
>> >>> > Thanks,
>> >>> > /Martin
>> >>>
>> >>> Hi Martin.
>> >>>
>> >>> Note that in case you want us to do the work of isolating the problem,
>> >>> we do offer paid support to do that (then we can sign NDAs and stuff).
>> >>> Otherwise we would be more than happy to fix bugs once you isolate a
>> >>> part you can share freely :)
>> >>
>> >>
>> >
>
>