[pypy-dev] Pypy garbage collection

Maciej Fijalkowski fijall at gmail.com
Thu Mar 13 00:56:34 CET 2014

On Thu, Mar 13, 2014 at 12:06 AM, Martin Koch <mak at issuu.com> wrote:
> Hi List
> I'm running a server (written in python, executed with pypy) that holds a
> large graph (55GB, millions of nodes and edges) in memory and responds to
> queries by traversing the graph.The graph is mutated a few times a second,
> and there are hundreds of read-only requests a second.
> My problem is that I no control over garbage collection. Thus, a major GC
> might kick in while serving a query, and with this amount of data, the GC
> takes around 2 minutes. I have tried mitigating this by guessing when a GC
> might be due, and proactively starting the garbage collector while not
> serving a request (this is ok, as duplicate servers will respond to requests
> while this one is collecting).
> What I would really like is to be able to disable garbage collection for the
> old generation. This is because the graph is fairly static, and I can live
> with leaking memory from the relatively few and small mutations that occur.
> Any queries are only likely to generate objects in the new generation, and
> it is fine to collect these. Also, by design, the process is periodically
> restarted in order to re-synchronize it with an authoritative source (thus
> rebuilding the graph from scratch), so slight leakage is not an issue here.
> I have tried experimenting with setting environment variables as well as the
> 'gc' module, but nothing seems to give me what I want.
> If disabling gc for certain generations is not possible, it would be nice to
> be able to get a hint when a major collection is about to occur, so I can
> stop serving requests.
> I'm using the following pypy version:
> Python 2.7.3 (2.2.1+dfsg-1, Jan 24 2014, 10:12:37)
> [PyPy 2.2.1 with GCC 4.6.3] on linux2
> An additional question: pypy 2.2.1 should have incremental GC; shouldn't
> that avoid long pauses due to garbage collection?

Yes, it totally should. If your pauses are not incremental, we would
like to be able to execute it. Since it's 55G, do you think you can
make us an example that can run on a normal machine?

More information about the pypy-dev mailing list