[pypy-dev] Pypy garbage collection

Maciej Fijalkowski fijall at gmail.com
Mon Mar 17 20:51:04 CET 2014


no, the problem here is definitely that you're generating a lot of
garbage that survives the young generation. you can always try playing
with PYPY_GC_NURSERY envirtonment var (defaults to 4M I think)

On Mon, Mar 17, 2014 at 9:04 PM, Martin Koch <mak at issuu.com> wrote:
> Well, it would appear that we have the problem because we're generating a
> lot of garbage in the young generation, just like we're doing in the example
> we've been studying here. I'm unsure how we can avoid that in our real
> implementation. Can we force gc of the young generation? Either by
> gc.collect() or implcitly somehow (does the gc e.g. kick in across function
> calls?).
>
> Thanks,
> /Martin
>
>
> On Mon, Mar 17, 2014 at 5:39 PM, Maciej Fijalkowski <fijall at gmail.com>
> wrote:
>>
>> not sure how more we can help without looking into the code
>>
>> On Mon, Mar 17, 2014 at 6:05 PM, Martin Koch <mak at issuu.com> wrote:
>> > Thanks :)
>> >
>> > /Martin
>> >
>> >
>> >> On 17/03/2014, at 16.41, Maciej Fijalkowski <fijall at gmail.com> wrote:
>> >>
>> >> ok.
>> >>
>> >> so as you can probably see, the max is not that big, which means the
>> >> GC is really incremental. What happens is you get tons of garbage that
>> >> survives minor collection every now and then. I don't exactly know
>> >> why, but you should look what objects can potentially survive for too
>> >> long.
>> >>
>> >>> On Mon, Mar 17, 2014 at 5:37 PM, Martin Koch <mak at issuu.com> wrote:
>> >>> Ah - it just occured to me that the first runs may be slow anyway:
>> >>> Since we
>> >>> take the average of the last 100 runs as the benchmark, then the first
>> >>> 100
>> >>> runs are not classified as slow. Indeed, the first three runs with
>> >>> many
>> >>> collections are in the first 100 runs.
>> >>>
>> >>>
>> >>>> On Mon, Mar 17, 2014 at 4:35 PM, Martin Koch <mak at issuu.com> wrote:
>> >>>>
>> >>>> Here are the total and max times in millions of units; 30000 units is
>> >>>> approximately 13 seconds. I have extracted the runs where there are
>> >>>> many
>> >>>> gc-collect-steps. These are in execution order, so the first runs
>> >>>> with many
>> >>>> gc-collect-steps aren't slow.
>> >>>>
>> >>>> Totals: gc-minor:418 gc-minor-walkroots:0 gc-collect-step:28797 Max:
>> >>>> gc-minor:10 gc-collect-step:247
>> >>>> Totals: gc-minor:562 gc-minor-walkroots:0 gc-collect-step:30282 Max:
>> >>>> gc-minor:10 gc-collect-step:245
>> >>>> Totals: gc-minor:434 gc-minor-walkroots:0 gc-collect-step:31040 Max:
>> >>>> gc-minor:11 gc-collect-step:244
>> >>>> Totals: gc-minor:417 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31270
>> >>>> Max: gc-minor:17 gc-collect-step:244
>> >>>> Totals: gc-minor:435 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30365
>> >>>> Max: gc-minor:11 gc-collect-step:248
>> >>>> Totals: gc-minor:389 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31235
>> >>>> Max: gc-minor:8 gc-collect-step:299
>> >>>> Totals: gc-minor:434 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31124
>> >>>> Max: gc-minor:11 gc-collect-step:246
>> >>>> Totals: gc-minor:386 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30541
>> >>>> Max: gc-minor:8 gc-collect-step:244
>> >>>> Totals: gc-minor:410 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31427
>> >>>> Max: gc-minor:36 gc-collect-step:248
>> >>>> Totals: gc-minor:390 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30743
>> >>>> Max: gc-minor:8 gc-collect-step:244
>> >>>> Totals: gc-minor:380 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30207
>> >>>> Max: gc-minor:8 gc-collect-step:245
>> >>>> Totals: gc-minor:387 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30837
>> >>>> Max: gc-minor:8 gc-collect-step:244
>> >>>> Totals: gc-minor:412 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30898
>> >>>> Max: gc-minor:38 gc-collect-step:244
>> >>>> Totals: gc-minor:415 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30407
>> >>>> Max: gc-minor:23 gc-collect-step:245
>> >>>> Totals: gc-minor:380 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30591
>> >>>> Max: gc-minor:8 gc-collect-step:246
>> >>>> Totals: gc-minor:387 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31193
>> >>>> Max: gc-minor:9 gc-collect-step:244
>> >>>> Totals: gc-minor:379 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30026
>> >>>> Max: gc-minor:8 gc-collect-step:246
>> >>>> Totals: gc-minor:388 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31179
>> >>>> Max: gc-minor:8 gc-collect-step:248
>> >>>> Totals: gc-minor:378 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30674
>> >>>> Max: gc-minor:8 gc-collect-step:250
>> >>>> Totals: gc-minor:385 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30413
>> >>>> Max: gc-minor:8 gc-collect-step:245
>> >>>> Totals: gc-minor:915 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:30830
>> >>>> Max: gc-minor:543 gc-collect-step:244
>> >>>> Totals: gc-minor:405 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:31153
>> >>>> Max: gc-minor:20 gc-collect-step:246
>> >>>> Totals: gc-minor:408 slow:1 gc-minor-walkroots:0
>> >>>> gc-collect-step:29815
>> >>>> Max: gc-minor:25 gc-collect-step:245
>> >>>>
>> >>>> Thanks,
>> >>>> /Martin
>> >>>>
>> >>>>
>> >>>>> On Mon, Mar 17, 2014 at 3:24 PM, Martin Koch <mak at issuu.com> wrote:
>> >>>>>
>> >>>>> Ah. I had misunderstood. I'll get back to you on that :) thanks
>> >>>>>
>> >>>>> /Martin
>> >>>>>
>> >>>>>
>> >>>>>> On 17/03/2014, at 15.21, Maciej Fijalkowski <fijall at gmail.com>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>> eh, this is not what I need
>> >>>>>>
>> >>>>>> I need a max of TIME it took for a gc-minor and the TOTAL time it
>> >>>>>> took
>> >>>>>> for a gc-minor (per query) (ideally same for gc-walkroots and
>> >>>>>> gc-collect-step)
>> >>>>>>
>> >>>>>>> On Mon, Mar 17, 2014 at 4:19 PM, Martin Koch <mak at issuu.com>
>> >>>>>>> wrote:
>> >>>>>>> Here are the collated results of running each query. For each run,
>> >>>>>>> I
>> >>>>>>> count
>> >>>>>>> how many of each of the pypy debug lines i get. I.e. there were
>> >>>>>>> 668
>> >>>>>>> runs
>> >>>>>>> that printed 58 loglines that contain "{gc-minor" which was
>> >>>>>>> eventually
>> >>>>>>> followed by "gc-minor}". I have also counted if the query was
>> >>>>>>> slow;
>> >>>>>>> interestingly, not all the queries with many gc-minors were slow
>> >>>>>>> (but
>> >>>>>>> all
>> >>>>>>> slow queries had a gc-minor).
>> >>>>>>>
>> >>>>>>> Please let me know if this is unclear :)
>> >>>>>>>
>> >>>>>>>   668 gc-minor:58 gc-minor-walkroots:58
>> >>>>>>>    10 gc-minor:58 gc-minor-walkroots:58 gc-collect-step:5
>> >>>>>>>   140 gc-minor:59 gc-minor-walkroots:59
>> >>>>>>>     1 gc-minor:8441 gc-minor-walkroots:8441 gc-collect-step:8403
>> >>>>>>>     1 gc-minor:9300 gc-minor-walkroots:9300 gc-collect-step:9249
>> >>>>>>>     9 gc-minor:9643 slow:1 gc-minor-walkroots:9643
>> >>>>>>> gc-collect-step:9589
>> >>>>>>>     1 gc-minor:9644 slow:1 gc-minor-walkroots:9644
>> >>>>>>> gc-collect-step:9590
>> >>>>>>>    10 gc-minor:9647 slow:1 gc-minor-walkroots:9647
>> >>>>>>> gc-collect-step:9609
>> >>>>>>>     1 gc-minor:9663 gc-minor-walkroots:9663 gc-collect-step:9614
>> >>>>>>>     1 jit-backend-dump:5 gc-minor:58 gc-minor-walkroots:58
>> >>>>>>>     1 jit-log-compiling-loop:1 gc-collect-step:8991
>> >>>>>>> jit-backend-dump:78
>> >>>>>>> jit-backend:3 jit-log-noopt-loop:6 jit-log-virtualstate:3
>> >>>>>>> gc-minor:9030
>> >>>>>>> jit-tracing:3 gc-minor-walkroots:9030 jit-optimize:6
>> >>>>>>> jit-log-short-preamble:2 jit-backend-addr:3 jit-log-opt-loop:1
>> >>>>>>> jit-mem-looptoken-alloc:3 jit-abort:3 jit-log-rewritten-bridge:2
>> >>>>>>> jit-log-rewritten-loop:1 jit-log-opt-bridge:2
>> >>>>>>> jit-log-compiling-bridge:2
>> >>>>>>> jit-resume:84
>> >>>>>>>     1 jit-log-compiling-loop:1 jit-backend-dump:13 jit-backend:1
>> >>>>>>> jit-log-noopt-loop:2 gc-minor:60 jit-tracing:1
>> >>>>>>> gc-minor-walkroots:60
>> >>>>>>> jit-optimize:2 jit-log-short-preamble:1 jit-backend-addr:1
>> >>>>>>> jit-log-opt-loop:1 jit-mem-looptoken-alloc:1
>> >>>>>>> jit-log-rewritten-loop:1
>> >>>>>>> jit-resume:14
>> >>>>>>>     1 jit-log-compiling-loop:1 jit-backend-dump:73 jit-backend:3
>> >>>>>>> jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:60
>> >>>>>>> jit-tracing:3
>> >>>>>>> gc-minor-walkroots:60 jit-optimize:6 jit-log-short-preamble:2
>> >>>>>>> jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3
>> >>>>>>> jit-abort:3
>> >>>>>>> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1
>> >>>>>>> jit-log-opt-bridge:2
>> >>>>>>> jit-log-compiling-bridge:2 jit-resume:84
>> >>>>>>>     2 jit-log-compiling-loop:1 jit-backend-dump:78 jit-backend:3
>> >>>>>>> jit-log-noopt-loop:6 jit-log-virtualstate:3 gc-minor:61
>> >>>>>>> jit-tracing:3
>> >>>>>>> gc-minor-walkroots:61 jit-optimize:6 jit-log-short-preamble:2
>> >>>>>>> jit-backend-addr:3 jit-log-opt-loop:1 jit-mem-looptoken-alloc:3
>> >>>>>>> jit-abort:3
>> >>>>>>> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:1
>> >>>>>>> jit-log-opt-bridge:2
>> >>>>>>> jit-log-compiling-bridge:2 jit-resume:84
>> >>>>>>>     1 jit-log-short-preamble:2 jit-log-compiling-loop:2
>> >>>>>>> jit-backend-dump:92 jit-log-noopt-loop:7 jit-log-virtualstate:3
>> >>>>>>> gc-minor:61
>> >>>>>>> jit-tracing:4 gc-minor-walkroots:61 jit-optimize:7 jit-backend:4
>> >>>>>>> jit-backend-addr:4 jit-log-opt-loop:2 jit-mem-looptoken-alloc:4
>> >>>>>>> jit-abort:3
>> >>>>>>> jit-log-rewritten-bridge:2 jit-log-rewritten-loop:2
>> >>>>>>> jit-log-opt-bridge:2
>> >>>>>>> jit-log-compiling-bridge:2 jit-resume:104
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Thanks,
>> >>>>>>> /Martin
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Mon, Mar 17, 2014 at 2:23 PM, Maciej Fijalkowski
>> >>>>>>> <fijall at gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> On Mon, Mar 17, 2014 at 3:20 PM, Maciej Fijalkowski
>> >>>>>>>> <fijall at gmail.com>
>> >>>>>>>> wrote:
>> >>>>>>>>> are you *sure* it's the walkroots that take that long and not
>> >>>>>>>>> something else (like gc-minor)? More of those mean that you
>> >>>>>>>>> allocate
>> >>>>>>>>> a
>> >>>>>>>>> lot more surviving objects. Can you do two things:
>> >>>>>>>>>
>> >>>>>>>>> a) take a max of gc-minor (and gc-minor-stackwalk), per request
>> >>>>>>>>> b) take the sum of those
>> >>>>>>>>>
>> >>>>>>>>> and plot them
>> >>>>>>>>
>> >>>>>>>> ^^^ or just paste the results actually
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> On Mon, Mar 17, 2014 at 3:18 PM, Martin Koch <mak at issuu.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>> Well, then it works out to around 2.5GHz, which seems
>> >>>>>>>>>> reasonable.
>> >>>>>>>>>> But
>> >>>>>>>>>> it
>> >>>>>>>>>> doesn't alter the conclusion from the previous email: The slow
>> >>>>>>>>>> queries
>> >>>>>>>>>> then
>> >>>>>>>>>> all have a duration around 34*10^9 units, 'normal' queries
>> >>>>>>>>>> 1*10^9
>> >>>>>>>>>> units, or
>> >>>>>>>>>> .4 seconds at this conversion. Also, the log shows that a slow
>> >>>>>>>>>> query
>> >>>>>>>>>> performs many more gc-minor operations than a 'normal' one:
>> >>>>>>>>>> 9600
>> >>>>>>>>>> gc-collect-step/gc-minor/gc-minor-walkroots operations vs 58.
>> >>>>>>>>>>
>> >>>>>>>>>> So the question becomes: Why do we get this large spike in
>> >>>>>>>>>> gc-minor-walkroots, and, in particular, is there any way to
>> >>>>>>>>>> avoid
>> >>>>>>>>>> it :)
>> >>>>>>>>>> ?
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks,
>> >>>>>>>>>> /Martin
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> On Mon, Mar 17, 2014 at 1:53 PM, Maciej Fijalkowski
>> >>>>>>>>>> <fijall at gmail.com>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>> I think it's the cycles of your CPU
>> >>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Mar 17, 2014 at 2:48 PM, Martin Koch <mak at issuu.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>> What is the unit? Perhaps I'm being thick here, but I can't
>> >>>>>>>>>>>> correlate it
>> >>>>>>>>>>>> with seconds (which the program does print out). Slow runs
>> >>>>>>>>>>>> are
>> >>>>>>>>>>>> around 13
>> >>>>>>>>>>>> seconds, but are around 34*10^9(dec), 0x800000000 timestamp
>> >>>>>>>>>>>> units
>> >>>>>>>>>>>> (e.g.
>> >>>>>>>>>>>> from
>> >>>>>>>>>>>> 0x2b994c9d31889c to 0x2b9944ab8c4f49).
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Mon, Mar 17, 2014 at 12:09 PM, Maciej Fijalkowski
>> >>>>>>>>>>>> <fijall at gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> The number of lines is nonsense. This is a timestamp in hex.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Mon, Mar 17, 2014 at 12:46 PM, Martin Koch
>> >>>>>>>>>>>>> <mak at issuu.com>
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>> Based On Maciej's suggestion, I tried the following
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> PYPYLOG=- pypy mem.py 10000000 > out
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> This generates a logfile which looks something like this
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> start-->
>> >>>>>>>>>>>>>> [2b99f1981b527e] {gc-minor
>> >>>>>>>>>>>>>> [2b99f1981ba680] {gc-minor-walkroots
>> >>>>>>>>>>>>>> [2b99f1981c2e02] gc-minor-walkroots}
>> >>>>>>>>>>>>>> [2b99f19890d750] gc-minor}
>> >>>>>>>>>>>>>> [snip]
>> >>>>>>>>>>>>>> ...
>> >>>>>>>>>>>>>> <--stop
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> It turns out that the culprit is a lot of MINOR
>> >>>>>>>>>>>>>> collections.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I base this on the following observations:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I can't understand the format of the timestamp on each
>> >>>>>>>>>>>>>> logline
>> >>>>>>>>>>>>>> (the
>> >>>>>>>>>>>>>> "[2b99f1981b527e]"). From what I can see in the code, this
>> >>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>> output
>> >>>>>>>>>>>>>> from time.clock(), but that doesn't return a number like
>> >>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>> when I
>> >>>>>>>>>>>>>> run
>> >>>>>>>>>>>>>> pypy interactively
>> >>>>>>>>>>>>>> Instead, I count the number of debug lines between start-->
>> >>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>> corresponding <--stop.
>> >>>>>>>>>>>>>> Most runs have a few hundred lines of output between
>> >>>>>>>>>>>>>> start/stop
>> >>>>>>>>>>>>>> All slow runs have very close to 57800 lines out output
>> >>>>>>>>>>>>>> between
>> >>>>>>>>>>>>>> start/stop
>> >>>>>>>>>>>>>> One such sample does 9609 gc-collect-step operations, 9647
>> >>>>>>>>>>>>>> gc-minor
>> >>>>>>>>>>>>>> operations, and 9647 gc-minor-walkroots operations.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>> /Martin
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Mon, Mar 17, 2014 at 8:21 AM, Maciej Fijalkowski
>> >>>>>>>>>>>>>> <fijall at gmail.com>
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> there is an environment variable PYPYLOG=gc:- (where - is
>> >>>>>>>>>>>>>>> stdout)
>> >>>>>>>>>>>>>>> which will do that for you btw.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> maybe you can find out what's that using profiling or
>> >>>>>>>>>>>>>>> valgrind?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Sun, Mar 16, 2014 at 11:34 PM, Martin Koch
>> >>>>>>>>>>>>>>> <mak at issuu.com>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>> I have tried getting the pypy source and building my own
>> >>>>>>>>>>>>>>>> version
>> >>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>> pypy. I
>> >>>>>>>>>>>>>>>> have modified
>> >>>>>>>>>>>>>>>> rpython/memory/gc/incminimark.py:major_collection_step()
>> >>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>> print out when it starts and when it stops. Apparently,
>> >>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>> slow
>> >>>>>>>>>>>>>>>> queries
>> >>>>>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>> NOT occur during major_collection_step; at least, I have
>> >>>>>>>>>>>>>>>> not
>> >>>>>>>>>>>>>>>> observed
>> >>>>>>>>>>>>>>>> major
>> >>>>>>>>>>>>>>>> step output during a query execution. So, apparently,
>> >>>>>>>>>>>>>>>> something
>> >>>>>>>>>>>>>>>> else
>> >>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> blocking. This could be another aspect of the GC, but it
>> >>>>>>>>>>>>>>>> could
>> >>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>> anything else.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Just to be sure, I have tried running the same
>> >>>>>>>>>>>>>>>> application in
>> >>>>>>>>>>>>>>>> python
>> >>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>> garbage collection disabled. I don't see the problem
>> >>>>>>>>>>>>>>>> there,
>> >>>>>>>>>>>>>>>> so
>> >>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>> somehow
>> >>>>>>>>>>>>>>>> related to either GC or the runtime somehow.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Cheers,
>> >>>>>>>>>>>>>>>> /Martin
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Fri, Mar 14, 2014 at 4:19 PM, Martin Koch
>> >>>>>>>>>>>>>>>> <mak at issuu.com>
>> >>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We have hacked up a small sample that seems to exhibit
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>> issue.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We basically generate a linked list of objects. To
>> >>>>>>>>>>>>>>>>> increase
>> >>>>>>>>>>>>>>>>> connectedness,
>> >>>>>>>>>>>>>>>>> elements in the list hold references (dummy_links) to 10
>> >>>>>>>>>>>>>>>>> randomly
>> >>>>>>>>>>>>>>>>> chosen
>> >>>>>>>>>>>>>>>>> previous elements in the list.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We then time a function that traverses 50000 elements
>> >>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> list
>> >>>>>>>>>>>>>>>>> from a
>> >>>>>>>>>>>>>>>>> random start point. If the traversal reaches the end of
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> list,
>> >>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>> instead
>> >>>>>>>>>>>>>>>>> traverse one of the dummy links. Thus, exactly 50K
>> >>>>>>>>>>>>>>>>> elements
>> >>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>> traversed
>> >>>>>>>>>>>>>>>>> every time. To generate some garbage, we build a list
>> >>>>>>>>>>>>>>>>> holding
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> traversed
>> >>>>>>>>>>>>>>>>> elements and a dummy list of characters.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Timings for the last 100 runs are stored in a circular
>> >>>>>>>>>>>>>>>>> buffer. If
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> elapsed time for the last run is more than twice the
>> >>>>>>>>>>>>>>>>> average
>> >>>>>>>>>>>>>>>>> time,
>> >>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>> print
>> >>>>>>>>>>>>>>>>> out a line with the elapsed time, the threshold, and the
>> >>>>>>>>>>>>>>>>> 90%
>> >>>>>>>>>>>>>>>>> runtime
>> >>>>>>>>>>>>>>>>> (we
>> >>>>>>>>>>>>>>>>> would like to see that the mean runtime does not
>> >>>>>>>>>>>>>>>>> increase
>> >>>>>>>>>>>>>>>>> with
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> number of
>> >>>>>>>>>>>>>>>>> elements in the list, but that the max time does
>> >>>>>>>>>>>>>>>>> increase
>> >>>>>>>>>>>>>>>>> (linearly
>> >>>>>>>>>>>>>>>>> with the
>> >>>>>>>>>>>>>>>>> number of object, i guess); traversing 50K elements
>> >>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>> be
>> >>>>>>>>>>>>>>>>> independent of
>> >>>>>>>>>>>>>>>>> the memory size).
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> We have tried monitoring memory consumption by external
>> >>>>>>>>>>>>>>>>> inspection,
>> >>>>>>>>>>>>>>>>> but
>> >>>>>>>>>>>>>>>>> cannot consistently verify that memory is deallocated at
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> same
>> >>>>>>>>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>> we see slow requests. Perhaps the pypy runtime doesn't
>> >>>>>>>>>>>>>>>>> always
>> >>>>>>>>>>>>>>>>> return
>> >>>>>>>>>>>>>>>>> freed
>> >>>>>>>>>>>>>>>>> pages back to the OS?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Using top, we observe that 10M elements allocates around
>> >>>>>>>>>>>>>>>>> 17GB
>> >>>>>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>>> building, 20M elements 26GB, 30M elements 28GB (and
>> >>>>>>>>>>>>>>>>> grows to
>> >>>>>>>>>>>>>>>>> 35GB
>> >>>>>>>>>>>>>>>>> shortly
>> >>>>>>>>>>>>>>>>> after building).
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Here is output from a few runs with different number of
>> >>>>>>>>>>>>>>>>> elements:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> pypy mem.py 10000000
>> >>>>>>>>>>>>>>>>> start build
>> >>>>>>>>>>>>>>>>> end build 84.142424
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 13.230586
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 1.495401
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.421558
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 13.016531
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 1.488160
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.423441
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 13.032537
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 1.474563
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.419817
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> pypy mem.py 20000000
>> >>>>>>>>>>>>>>>>> start build
>> >>>>>>>>>>>>>>>>> end build 180.823105
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 27.346064
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 2.295146
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.434726
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 26.028852
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 2.283927
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.374190
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 25.432279
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 2.279631
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.371502
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> pypy mem.py 30000000
>> >>>>>>>>>>>>>>>>> start build
>> >>>>>>>>>>>>>>>>> end build 276.217811
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 40.993855
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 3.188464
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.459891
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 41.693553
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 3.183003
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.393654
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 39.679769
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 3.190782
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.393677
>> >>>>>>>>>>>>>>>>> that took a long time elapsed: 43.573411
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> 3.239637
>> >>>>>>>>>>>>>>>>> 90th_quantile_runtime: 0.393654
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Code below
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> --------------------------------------------------------------
>> >>>>>>>>>>>>>>>>> import time
>> >>>>>>>>>>>>>>>>> from random import randint, choice
>> >>>>>>>>>>>>>>>>> import sys
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> allElems = {}
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> class Node:
>> >>>>>>>>>>>>>>>>>   def __init__(self, v_):
>> >>>>>>>>>>>>>>>>>       self.v = v_
>> >>>>>>>>>>>>>>>>>       self.next = None
>> >>>>>>>>>>>>>>>>>       self.dummy_data = [randint(0,100)
>> >>>>>>>>>>>>>>>>>                          for _ in
>> >>>>>>>>>>>>>>>>> xrange(randint(50,100))]
>> >>>>>>>>>>>>>>>>>       allElems[self.v] = self
>> >>>>>>>>>>>>>>>>>       if self.v > 0:
>> >>>>>>>>>>>>>>>>>           self.dummy_links = [allElems[randint(0,
>> >>>>>>>>>>>>>>>>> self.v-1)]
>> >>>>>>>>>>>>>>>>> for _
>> >>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> xrange(10)]
>> >>>>>>>>>>>>>>>>>       else:
>> >>>>>>>>>>>>>>>>>           self.dummy_links = [self]
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>   def set_next(self, l):
>> >>>>>>>>>>>>>>>>>       self.next = l
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> def follow(node):
>> >>>>>>>>>>>>>>>>>   acc = []
>> >>>>>>>>>>>>>>>>>   count = 0
>> >>>>>>>>>>>>>>>>>   cur = node
>> >>>>>>>>>>>>>>>>>   assert node.v is not None
>> >>>>>>>>>>>>>>>>>   assert cur is not None
>> >>>>>>>>>>>>>>>>>   while count < 50000:
>> >>>>>>>>>>>>>>>>>       # return a value; generate some garbage
>> >>>>>>>>>>>>>>>>>       acc.append((cur.v,
>> >>>>>>>>>>>>>>>>> [choice("abcdefghijklmnopqrstuvwxyz")
>> >>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>> x
>> >>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>> xrange(100)]))
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>       # if we have reached the end, chose a random link
>> >>>>>>>>>>>>>>>>>       cur = choice(cur.dummy_links) if cur.next is None
>> >>>>>>>>>>>>>>>>> else
>> >>>>>>>>>>>>>>>>> cur.next
>> >>>>>>>>>>>>>>>>>       count += 1
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>   return acc
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> def build(num_elems):
>> >>>>>>>>>>>>>>>>>   start = time.time()
>> >>>>>>>>>>>>>>>>>   print "start build"
>> >>>>>>>>>>>>>>>>>   root = Node(0)
>> >>>>>>>>>>>>>>>>>   cur = root
>> >>>>>>>>>>>>>>>>>   for x in xrange(1, num_elems):
>> >>>>>>>>>>>>>>>>>       e = Node(x)
>> >>>>>>>>>>>>>>>>>       cur.next = e
>> >>>>>>>>>>>>>>>>>       cur = e
>> >>>>>>>>>>>>>>>>>   print "end build %f" % (time.time() - start)
>> >>>>>>>>>>>>>>>>>   return root
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> num_timings = 100
>> >>>>>>>>>>>>>>>>> if __name__ == "__main__":
>> >>>>>>>>>>>>>>>>>   num_elems = int(sys.argv[1])
>> >>>>>>>>>>>>>>>>>   build(num_elems)
>> >>>>>>>>>>>>>>>>>   total = 0
>> >>>>>>>>>>>>>>>>>   timings = [0.0] * num_timings # run times for the last
>> >>>>>>>>>>>>>>>>> num_timings
>> >>>>>>>>>>>>>>>>> runs
>> >>>>>>>>>>>>>>>>>   i = 0
>> >>>>>>>>>>>>>>>>>   beginning = time.time()
>> >>>>>>>>>>>>>>>>>   while time.time() - beginning < 600:
>> >>>>>>>>>>>>>>>>>       start = time.time()
>> >>>>>>>>>>>>>>>>>       elem = allElems[randint(0, num_elems - 1)]
>> >>>>>>>>>>>>>>>>>       assert(elem is not None)
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>       lst = follow(elem)
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>       total += choice(lst)[0] # use the return value for
>> >>>>>>>>>>>>>>>>> something
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>       end = time.time()
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>       elapsed = end-start
>> >>>>>>>>>>>>>>>>>       timings[i % num_timings] = elapsed
>> >>>>>>>>>>>>>>>>>       if (i > num_timings):
>> >>>>>>>>>>>>>>>>>           slow_time = 2 * sum(timings)/num_timings #
>> >>>>>>>>>>>>>>>>> slow
>> >>>>>>>>>>>>>>>>> defined
>> >>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 2*avg run time
>> >>>>>>>>>>>>>>>>>           if (elapsed > slow_time):
>> >>>>>>>>>>>>>>>>>               print "that took a long time elapsed: %f
>> >>>>>>>>>>>>>>>>> slow_threshold:
>> >>>>>>>>>>>>>>>>> %f  90th_quantile_runtime: %f" % \
>> >>>>>>>>>>>>>>>>>                   (elapsed, slow_time,
>> >>>>>>>>>>>>>>>>> sorted(timings)[int(num_timings*.9)])
>> >>>>>>>>>>>>>>>>>       i += 1
>> >>>>>>>>>>>>>>>>>   print total
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Thu, Mar 13, 2014 at 7:45 PM, Maciej Fijalkowski
>> >>>>>>>>>>>>>>>>> <fijall at gmail.com>
>> >>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> On Thu, Mar 13, 2014 at 1:45 PM, Martin Koch
>> >>>>>>>>>>>>>>>>>> <mak at issuu.com>
>> >>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>> Hi Armin, Maciej
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks for responding.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I'm in the process of trying to determine what (if
>> >>>>>>>>>>>>>>>>>>> any) of
>> >>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> code
>> >>>>>>>>>>>>>>>>>>> I'm
>> >>>>>>>>>>>>>>>>>>> in a
>> >>>>>>>>>>>>>>>>>>> position to share, and I'll get back to you.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Allowing hinting to the GC would be good. Even better
>> >>>>>>>>>>>>>>>>>>> would be
>> >>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>> means
>> >>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> allow me to (transparently) allocate objects in
>> >>>>>>>>>>>>>>>>>>> unmanaged
>> >>>>>>>>>>>>>>>>>>> memory,
>> >>>>>>>>>>>>>>>>>>> but I
>> >>>>>>>>>>>>>>>>>>> would expect that to be a tall order :)
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>>> /Martin
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Hi Martin.
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Note that in case you want us to do the work of
>> >>>>>>>>>>>>>>>>>> isolating
>> >>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> problem,
>> >>>>>>>>>>>>>>>>>> we do offer paid support to do that (then we can sign
>> >>>>>>>>>>>>>>>>>> NDAs
>> >>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>> stuff).
>> >>>>>>>>>>>>>>>>>> Otherwise we would be more than happy to fix bugs once
>> >>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>> isolate
>> >>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>> part you can share freely :)
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>
>> >>>>
>> >>>
>
>


More information about the pypy-dev mailing list