[pypy-dev] Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) -- Help!?

Robert Whitcher robert.whitcher at rubrik.com
Thu Mar 28 13:55:57 EDT 2019


Thanks Antonio...

Relative to cpyext... I am not sure.  We are not directly, but who knows
what is being used by incorporated modules (pymongo, etc..)
Trying to create a stripped down test case...
I filed bug here with some updates:

https://bitbucket.org/pypy/pypy/issues/2982/continual-increase-in-memory-utilization

Reproduces in pypy, pypy without JIT but *does not* reproduce in CPython so
there is some indications there

Once I get the stripped down test case that is not tied in to our entire
codebase -- I will publish it in the bug if I can and try other PyPy
versions as well.

Rob

On Thu, Mar 28, 2019 at 10:56 AM Antonio Cuni <anto.cuni at gmail.com> wrote:

> Hi Robert,
> are you using any package which relies on cpyext? I.e., modules written in
> C and/or with Cython (cffi is fine).
> IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext
> objects.
> So if you have a cycle which does e.g.
>     Py_foo -> C_bar -> Py_foo
> (where Py_foo is a pure-python object and C_bar a cpyext object) they will
> never be collected unless you break the cycle manually.
>
> Other than that: have you tried running it with PyPy 7.0 and/or 7.1?
>
>
> On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher <
> robert.whitcher at rubrik.com> wrote:
>
>> So I have a process that use PyPy and pymongo in a loop.
>> It does basically the same thing every loop, which query a table in via
>> pymongo and do a few non-save calculations and then wait and loop again
>>
>> The RSS of the process continually increased (the PYPY_GC_MAX is set
>> pretty high).
>> So I hooked in the GC stats output per:
>> http://doc.pypy.org/en/latest/gc_info.html
>> I also assure that gc.collect() was called at least every 3 minutes.
>>
>> What I see is that... The memory while high is fair constant for a long
>> time:
>>
>> 2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736
>> ...
>> 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864
>> 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
>>
>>
>> Then it decides (an the exact per-loop behavior is the same each time) to
>> chew up much more memory:
>>
>> 2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440
>> 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232
>> 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168
>> 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872
>> 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208
>> 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304
>> 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480
>> 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
>>
>>
>> That's 140 MB .... Where is all that memory going...
>> What's more is that the PyPy GC stats do not show anything different:
>>
>> Here are the GC stats from GC-Complete when we were at *144MB*:
>>
>> 2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064
>> 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log
>> (async_worker_process.py:308): DBG0: Total memory consumed:
>>             GC used:            56.8MB (peak: 69.6MB)
>>                in arenas:            39.3MB
>>                rawmalloced:          14.5MB
>>                nursery:              3.0MB
>>             raw assembler used: 521.6kB
>>             -----------------------------
>>             Total:              57.4MB
>>
>>             Total memory allocated:
>>             GC allocated:            63.0MB (peak: 71.2MB)
>>                in arenas:            43.9MB
>>                rawmalloced:          22.7MB
>>                nursery:              3.0MB
>>             raw assembler allocated: 1.0MB
>>             -----------------------------
>>             Total:                   64.0MB
>>
>>
>> Here are the GC stats from GC-Complete when we are at *285MB*:
>>
>> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
>> (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136
>> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
>> (async_worker_process.py:308): DBG0: Total memory consumed:
>>             GC used:            57.5MB (peak: 69.6MB)
>>                in arenas:            39.9MB
>>                rawmalloced:          14.6MB
>>                nursery:              3.0MB
>>             raw assembler used: 1.5MB
>>             -----------------------------
>>             Total:              58.9MB
>>
>>             Total memory allocated:
>>             GC allocated:            63.1MB (peak: 71.2MB)
>>                in arenas:            43.9MB
>>                rawmalloced:          22.7MB
>>                nursery:              3.0MB
>>             raw assembler allocated: 2.0MB
>>             -----------------------------
>>             Total:                   65.1MB
>>
>>
>> How is this possible?
>>
>> I am measuring RSS with:
>>
>> def get_rss_mem_usage():
>>     '''
>>     Get the RSS memory usage in bytes
>>     @return: memory size in bytes; -1 if error occurs
>>     '''
>>     try:
>>         process = psutil.Process(os.getpid())
>>         return process.get_memory_info().rss
>>     except:
>>         return -1
>>
>>
>> And cross referencing with "ps -orss -p <pid>" and the RSS values
>> reported are correct....
>>
>> I cannot figure out where to go from here with this as it appears that
>> PyPy is leaking this memory somehow...
>> And I have no idea howto proceed from here...
>> I end up having memory problems and getting Memory Warnings for a process
>> that just loops and queries via pymongo
>> Pymongo version is 3.7.1
>>
>> This is:
>>
>> Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018,
>> 08:21:35)
>> [PyPy 6.0.0 with GCC 7.2.0]
>>
>>
>>
>> _______________________________________________
>> pypy-dev mailing list
>> pypy-dev at python.org
>> https://mail.python.org/mailman/listinfo/pypy-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20190328/b2e74f78/attachment-0001.html>


More information about the pypy-dev mailing list