Well, then it works out to around 2.5GHz, which seems reasonable. But it doesn't alter the conclusion from the previous email: The slow queries then all have a duration around 34*10^9 units, 'normal' queries 1*10^9 units, or .4 seconds at this conversion. Also, the log shows that a slow query performs many more gc-minor operations than a 'normal' one: 9600 gc-collect-step/gc-minor/gc-minor-walkroots operations vs 58. So the question becomes: Why do we get this large spike in gc-minor-walkroots, and, in particular, is there any way to avoid it :) ? Thanks, /Martin On Mon, Mar 17, 2014 at 1:53 PM, Maciej Fijalkowski <fijall@gmail.com>wrote:
I think it's the cycles of your CPU
What is the unit? Perhaps I'm being thick here, but I can't correlate it with seconds (which the program does print out). Slow runs are around 13 seconds, but are around 34*10^9(dec), 0x800000000 timestamp units (e.g. from 0x2b994c9d31889c to 0x2b9944ab8c4f49).
On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:
The number of lines is nonsense. This is a timestamp in hex.
On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch <mak@issuu.com> wrote:
Based On Maciej's suggestion, I tried the following
PYPYLOG=- pypy mem.py 10000000 > out
This generates a logfile which looks something like this
start--> [2b99f1981b527e] {gc-minor [2b99f1981ba680] {gc-minor-walkroots [2b99f1981c2e02] gc-minor-walkroots} [2b99f19890d750] gc-minor} [snip] ... <--stop
It turns out that the culprit is a lot of MINOR collections.
I base this on the following observations:
I can't understand the format of the timestamp on each logline (the "[2b99f1981b527e]"). From what I can see in the code, this should be output from time.clock(), but that doesn't return a number like that when I
run
pypy interactively Instead, I count the number of debug lines between start--> and the corresponding <--stop. Most runs have a few hundred lines of output between start/stop All slow runs have very close to 57800 lines out output between start/stop One such sample does 9609 gc-collect-step operations, 9647 gc-minor operations, and 9647 gc-minor-walkroots operations.
Thanks, /Martin
On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski <fijall@gmail.com
wrote:
there is an environment variable PYPYLOG=gc:- (where - is stdout) which will do that for you btw.
maybe you can find out what's that using profiling or valgrind?
On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch <mak@issuu.com> wrote:
I have tried getting the pypy source and building my own version of pypy. I have modified rpython/memory/gc/incminimark.py:major_collection_step() to print out when it starts and when it stops. Apparently, the slow queries do NOT occur during major_collection_step; at least, I have not
observed
major step output during a query execution. So, apparently, something else is blocking. This could be another aspect of the GC, but it could also be anything else.
Just to be sure, I have tried running the same application in
with garbage collection disabled. I don't see the problem there, so it is somehow related to either GC or the runtime somehow.
Cheers, /Martin
On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch <mak@issuu.com> wrote: > > We have hacked up a small sample that seems to exhibit the same > issue. > > We basically generate a linked list of objects. To increase > connectedness, > elements in the list hold references (dummy_links) to 10 randomly > chosen > previous elements in the list. > > We then time a function that traverses 50000 elements from the
On Mon, Mar 17, 2014 at 2:48 PM, Martin Koch <mak@issuu.com> wrote: python list
> from a > random start point. If the traversal reaches the end of the list, we > instead > traverse one of the dummy links. Thus, exactly 50K elements are > traversed > every time. To generate some garbage, we build a list holding the > traversed > elements and a dummy list of characters. > > Timings for the last 100 runs are stored in a circular buffer. If > the > elapsed time for the last run is more than twice the average time, > we > print > out a line with the elapsed time, the threshold, and the 90% runtime > (we > would like to see that the mean runtime does not increase with the > number of > elements in the list, but that the max time does increase (linearly > with the > number of object, i guess); traversing 50K elements should be > independent of > the memory size). > > We have tried monitoring memory consumption by external inspection, > but > cannot consistently verify that memory is deallocated at the same > time > that > we see slow requests. Perhaps the pypy runtime doesn't always return > freed > pages back to the OS? > > Using top, we observe that 10M elements allocates around 17GB after > building, 20M elements 26GB, 30M elements 28GB (and grows to 35GB > shortly > after building). > > Here is output from a few runs with different number of elements: > > > pypy mem.py 10000000 > start build > end build 84.142424 > that took a long time elapsed: 13.230586 slow_threshold: 1.495401 > 90th_quantile_runtime: 0.421558 > that took a long time elapsed: 13.016531 slow_threshold: 1.488160 > 90th_quantile_runtime: 0.423441 > that took a long time elapsed: 13.032537 slow_threshold: 1.474563 > 90th_quantile_runtime: 0.419817 > > pypy mem.py 20000000 > start build > end build 180.823105 > that took a long time elapsed: 27.346064 slow_threshold: 2.295146 > 90th_quantile_runtime: 0.434726 > that took a long time elapsed: 26.028852 slow_threshold: 2.283927 > 90th_quantile_runtime: 0.374190 > that took a long time elapsed: 25.432279 slow_threshold: 2.279631 > 90th_quantile_runtime: 0.371502 > > pypy mem.py 30000000 > start build > end build 276.217811 > that took a long time elapsed: 40.993855 slow_threshold: 3.188464 > 90th_quantile_runtime: 0.459891 > that took a long time elapsed: 41.693553 slow_threshold: 3.183003 > 90th_quantile_runtime: 0.393654 > that took a long time elapsed: 39.679769 slow_threshold: 3.190782 > 90th_quantile_runtime: 0.393677 > that took a long time elapsed: 43.573411 slow_threshold: 3.239637 > 90th_quantile_runtime: 0.393654 > > Code below > -------------------------------------------------------------- > import time > from random import randint, choice > import sys > > > allElems = {} > > class Node: > def __init__(self, v_): > self.v = v_ > self.next = None > self.dummy_data = [randint(0,100) > for _ in xrange(randint(50,100))] > allElems[self.v] = self > if self.v > 0: > self.dummy_links = [allElems[randint(0, self.v-1)] for _ > in > xrange(10)] > else: > self.dummy_links = [self] > > def set_next(self, l): > self.next = l > > > def follow(node): > acc = [] > count = 0 > cur = node > assert node.v is not None > assert cur is not None > while count < 50000: > # return a value; generate some garbage > acc.append((cur.v, [choice("abcdefghijklmnopqrstuvwxyz") for > x > in > xrange(100)])) > > # if we have reached the end, chose a random link > cur = choice(cur.dummy_links) if cur.next is None else > cur.next > count += 1 > > return acc > > > def build(num_elems): > start = time.time() > print "start build" > root = Node(0) > cur = root > for x in xrange(1, num_elems): > e = Node(x) > cur.next = e > cur = e > print "end build %f" % (time.time() - start) > return root > > > num_timings = 100 > if __name__ == "__main__": > num_elems = int(sys.argv[1]) > build(num_elems) > total = 0 > timings = [0.0] * num_timings # run times for the last > num_timings > runs > i = 0 > beginning = time.time() > while time.time() - beginning < 600: > start = time.time() > elem = allElems[randint(0, num_elems - 1)] > assert(elem is not None) > > lst = follow(elem) > > total += choice(lst)[0] # use the return value for something > > end = time.time() > > elapsed = end-start > timings[i % num_timings] = elapsed > if (i > num_timings): > slow_time = 2 * sum(timings)/num_timings # slow defined > as > > > 2*avg run time > if (elapsed > slow_time): > print "that took a long time elapsed: %f > slow_threshold: > %f 90th_quantile_runtime: %f" % \ > (elapsed, slow_time, > sorted(timings)[int(num_timings*.9)]) > i += 1 > print total > > > > > > On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski > <fijall@gmail.com> > wrote: >> >> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch <mak@issuu.com> wrote: >> > Hi Armin, Maciej >> > >> > Thanks for responding. >> > >> > I'm in the process of trying to determine what (if any) of the >> > code >> > I'm >> > in a >> > position to share, and I'll get back to you. >> > >> > Allowing hinting to the GC would be good. Even better would be a >> > means >> > to >> > allow me to (transparently) allocate objects in unmanaged memory, >> > but I >> > would expect that to be a tall order :) >> > >> > Thanks, >> > /Martin >> >> Hi Martin. >> >> Note that in case you want us to do the work of isolating the >> problem, >> we do offer paid support to do that (then we can sign NDAs and >> stuff). >> Otherwise we would be more than happy to fix bugs once you isolate >> a >> part you can share freely :) > >