[pypy-issue] Issue #2715: Memory from JSON-deserialized objects isn't reclaimed (pypy/pypy)

Artur Siekielski issues-reply at bitbucket.org
Tue Dec 26 06:39:13 EST 2017


New issue 2715: Memory from JSON-deserialized objects isn't reclaimed
https://bitbucket.org/pypy/pypy/issues/2715/memory-from-json-deserialized-objects-isnt

Artur Siekielski:

I encountered the issue while processing large numbers of JSON documents in batches. When processing a batch is finished, ie. the documents were deserialized and references to them are cleared, the memory isn't reclaimed and is always growing.

I was able to reproduce the issue using the attached code. The code json-loads documents that are varying in size in batches of 100. When the batch is processed, the loaded documents are cleared and gc.collect() is called.

The script prints the amount of used memory. I get the following output using PyPy2 5.9/5.10:


```
#!python

initial 114984
before clearing 1546032
after clearing and gc 1546032
before clearing 1561080
after clearing and gc 1561344
before clearing 1561608
after clearing and gc 1522916
before clearing 1559608
after clearing and gc 1560664


```

CPython 2.7.14 gives the output:


```
#!python

initial 89212
before clearing 2303832
after clearing and gc 153352
before clearing 2304968
after clearing and gc 153352
before clearing 2305756
after clearing and gc 153352
before clearing 2306736
after clearing and gc 153352
before clearing 2307520
after clearing and gc 153352

```

The default function returning a JSON document in the code is gen_doc_1. It generates some random document with nested dicts and arrays. When it's replaced with gen_doc_2 which return's an array of ints the issue isn't present.

I tried disabling JIT and played with controlling the GC with env. variables, but that didn't make any difference.




More information about the pypy-issue mailing list