[pypy-dev] Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) -- Help!?

René Dudfield renesd at gmail.com
Thu Mar 28 14:27:55 EDT 2019


Tried deleting the pymongo._cmessage module? It’s a cpyext one, so seems
like a good place to start. I think it works without it.

A general bisection memory leak finding technique that works in 11 runs or
under... Run the code under coverage, so you have the lines that are
actually called. Now comment out half the code so that it still runs. Is
there a memory leak still? Look at the uncommented lines. No leak? Look at
the comment lines. Keep doing it until you spot the problem, or you only
have 1 line left.

A common resource issue with pypy is how it does cleanup differently. Some
code relies on destructors and reference counting to clean things up. Maybe
check things like the number of open sockets and files?

objgraph is reeeeeealy good IMHO (haven’t tried it with pypy, not sure it
works)
https://mg.pov.lt/objgraph/



On Thursday, March 28, 2019, Robert Whitcher <robert.whitcher at rubrik.com>
wrote:

> Thanks Antonio...
>
> Relative to cpyext... I am not sure.  We are not directly, but who knows
> what is being used by incorporated modules (pymongo, etc..)
> Trying to create a stripped down test case...
> I filed bug here with some updates:
>     https://bitbucket.org/pypy/pypy/issues/2982/continual-
> increase-in-memory-utilization
>
> Reproduces in pypy, pypy without JIT but *does not* reproduce in CPython
> so there is some indications there
>
> Once I get the stripped down test case that is not tied in to our entire
> codebase -- I will publish it in the bug if I can and try other PyPy
> versions as well.
>
> Rob
>
> On Thu, Mar 28, 2019 at 10:56 AM Antonio Cuni <anto.cuni at gmail.com> wrote:
>
>> Hi Robert,
>> are you using any package which relies on cpyext? I.e., modules written
>> in C and/or with Cython (cffi is fine).
>> IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext
>> objects.
>> So if you have a cycle which does e.g.
>>     Py_foo -> C_bar -> Py_foo
>> (where Py_foo is a pure-python object and C_bar a cpyext object) they
>> will never be collected unless you break the cycle manually.
>>
>> Other than that: have you tried running it with PyPy 7.0 and/or 7.1?
>>
>>
>> On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher <
>> robert.whitcher at rubrik.com> wrote:
>>
>>> So I have a process that use PyPy and pymongo in a loop.
>>> It does basically the same thing every loop, which query a table in via
>>> pymongo and do a few non-save calculations and then wait and loop again
>>>
>>> The RSS of the process continually increased (the PYPY_GC_MAX is set
>>> pretty high).
>>> So I hooked in the GC stats output per: http://doc.pypy.org/en/
>>> latest/gc_info.html
>>> I also assure that gc.collect() was called at least every 3 minutes.
>>>
>>> What I see is that... The memory while high is fair constant for a long
>>> time:
>>>
>>> 2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736
>>> ...
>>> 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864
>>> 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
>>>
>>>
>>> Then it decides (an the exact per-loop behavior is the same each time)
>>> to chew up much more memory:
>>>
>>> 2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440
>>> 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232
>>> 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168
>>> 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872
>>> 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208
>>> 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304
>>> 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480
>>> 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
>>>
>>>
>>> That's 140 MB .... Where is all that memory going...
>>> What's more is that the PyPy GC stats do not show anything different:
>>>
>>> Here are the GC stats from GC-Complete when we were at *144MB*:
>>>
>>> 2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064
>>> 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:308): DBG0: Total memory consumed:
>>>             GC used:            56.8MB (peak: 69.6MB)
>>>                in arenas:            39.3MB
>>>                rawmalloced:          14.5MB
>>>                nursery:              3.0MB
>>>             raw assembler used: 521.6kB
>>>             -----------------------------
>>>             Total:              57.4MB
>>>
>>>             Total memory allocated:
>>>             GC allocated:            63.0MB (peak: 71.2MB)
>>>                in arenas:            43.9MB
>>>                rawmalloced:          22.7MB
>>>                nursery:              3.0MB
>>>             raw assembler allocated: 1.0MB
>>>             -----------------------------
>>>             Total:                   64.0MB
>>>
>>>
>>> Here are the GC stats from GC-Complete when we are at *285MB*:
>>>
>>> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136
>>> 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log
>>> (async_worker_process.py:308): DBG0: Total memory consumed:
>>>             GC used:            57.5MB (peak: 69.6MB)
>>>                in arenas:            39.9MB
>>>                rawmalloced:          14.6MB
>>>                nursery:              3.0MB
>>>             raw assembler used: 1.5MB
>>>             -----------------------------
>>>             Total:              58.9MB
>>>
>>>             Total memory allocated:
>>>             GC allocated:            63.1MB (peak: 71.2MB)
>>>                in arenas:            43.9MB
>>>                rawmalloced:          22.7MB
>>>                nursery:              3.0MB
>>>             raw assembler allocated: 2.0MB
>>>             -----------------------------
>>>             Total:                   65.1MB
>>>
>>>
>>> How is this possible?
>>>
>>> I am measuring RSS with:
>>>
>>> def get_rss_mem_usage():
>>>     '''
>>>     Get the RSS memory usage in bytes
>>>     @return: memory size in bytes; -1 if error occurs
>>>     '''
>>>     try:
>>>         process = psutil.Process(os.getpid())
>>>         return process.get_memory_info().rss
>>>     except:
>>>         return -1
>>>
>>>
>>> And cross referencing with "ps -orss -p <pid>" and the RSS values
>>> reported are correct....
>>>
>>> I cannot figure out where to go from here with this as it appears that
>>> PyPy is leaking this memory somehow...
>>> And I have no idea howto proceed from here...
>>> I end up having memory problems and getting Memory Warnings for a
>>> process that just loops and queries via pymongo
>>> Pymongo version is 3.7.1
>>>
>>> This is:
>>>
>>> Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018,
>>> 08:21:35)
>>> [PyPy 6.0.0 with GCC 7.2.0]
>>>
>>>
>>>
>>> _______________________________________________
>>> pypy-dev mailing list
>>> pypy-dev at python.org
>>> https://mail.python.org/mailman/listinfo/pypy-dev
>>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20190328/ef5eb33f/attachment-0001.html>


More information about the pypy-dev mailing list