Continual Increase In Memory Utilization Using PyPy 6.0 (python 2.7) -- Help!?
So I have a process that use PyPy and pymongo in a loop. It does basically the same thing every loop, which query a table in via pymongo and do a few non-save calculations and then wait and loop again The RSS of the process continually increased (the PYPY_GC_MAX is set pretty high). So I hooked in the GC stats output per: http://doc.pypy.org/en/latest/gc_info.html I also assure that gc.collect() was called at least every 3 minutes. What I see is that... The memory while high is fair constant for a long time: 2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736 ... 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312 Then it decides (an the exact per-loop behavior is the same each time) to chew up much more memory: 2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712 That's 140 MB .... Where is all that memory going... What's more is that the PyPy GC stats do not show anything different: Here are the GC stats from GC-Complete when we were at *144MB*: 2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 56.8MB (peak: 69.6MB) in arenas: 39.3MB rawmalloced: 14.5MB nursery: 3.0MB raw assembler used: 521.6kB ----------------------------- Total: 57.4MB Total memory allocated: GC allocated: 63.0MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 1.0MB ----------------------------- Total: 64.0MB Here are the GC stats from GC-Complete when we are at *285MB*: 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 57.5MB (peak: 69.6MB) in arenas: 39.9MB rawmalloced: 14.6MB nursery: 3.0MB raw assembler used: 1.5MB ----------------------------- Total: 58.9MB Total memory allocated: GC allocated: 63.1MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 2.0MB ----------------------------- Total: 65.1MB How is this possible? I am measuring RSS with: def get_rss_mem_usage(): ''' Get the RSS memory usage in bytes @return: memory size in bytes; -1 if error occurs ''' try: process = psutil.Process(os.getpid()) return process.get_memory_info().rss except: return -1 And cross referencing with "ps -orss -p <pid>" and the RSS values reported are correct.... I cannot figure out where to go from here with this as it appears that PyPy is leaking this memory somehow... And I have no idea howto proceed from here... I end up having memory problems and getting Memory Warnings for a process that just loops and queries via pymongo Pymongo version is 3.7.1 This is: Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, 08:21:35) [PyPy 6.0.0 with GCC 7.2.0]
Hi Robert, are you using any package which relies on cpyext? I.e., modules written in C and/or with Cython (cffi is fine). IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext objects. So if you have a cycle which does e.g. Py_foo -> C_bar -> Py_foo (where Py_foo is a pure-python object and C_bar a cpyext object) they will never be collected unless you break the cycle manually. Other than that: have you tried running it with PyPy 7.0 and/or 7.1? On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher <robert.whitcher@rubrik.com> wrote:
So I have a process that use PyPy and pymongo in a loop. It does basically the same thing every loop, which query a table in via pymongo and do a few non-save calculations and then wait and loop again
The RSS of the process continually increased (the PYPY_GC_MAX is set pretty high). So I hooked in the GC stats output per: http://doc.pypy.org/en/latest/gc_info.html I also assure that gc.collect() was called at least every 3 minutes.
What I see is that... The memory while high is fair constant for a long time:
2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736 ... 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
Then it decides (an the exact per-loop behavior is the same each time) to chew up much more memory:
2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
That's 140 MB .... Where is all that memory going... What's more is that the PyPy GC stats do not show anything different:
Here are the GC stats from GC-Complete when we were at *144MB*:
2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 56.8MB (peak: 69.6MB) in arenas: 39.3MB rawmalloced: 14.5MB nursery: 3.0MB raw assembler used: 521.6kB ----------------------------- Total: 57.4MB
Total memory allocated: GC allocated: 63.0MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 1.0MB ----------------------------- Total: 64.0MB
Here are the GC stats from GC-Complete when we are at *285MB*:
2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 57.5MB (peak: 69.6MB) in arenas: 39.9MB rawmalloced: 14.6MB nursery: 3.0MB raw assembler used: 1.5MB ----------------------------- Total: 58.9MB
Total memory allocated: GC allocated: 63.1MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 2.0MB ----------------------------- Total: 65.1MB
How is this possible?
I am measuring RSS with:
def get_rss_mem_usage(): ''' Get the RSS memory usage in bytes @return: memory size in bytes; -1 if error occurs ''' try: process = psutil.Process(os.getpid()) return process.get_memory_info().rss except: return -1
And cross referencing with "ps -orss -p <pid>" and the RSS values reported are correct....
I cannot figure out where to go from here with this as it appears that PyPy is leaking this memory somehow... And I have no idea howto proceed from here... I end up having memory problems and getting Memory Warnings for a process that just loops and queries via pymongo Pymongo version is 3.7.1
This is:
Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, 08:21:35) [PyPy 6.0.0 with GCC 7.2.0]
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Thanks Antonio... Relative to cpyext... I am not sure. We are not directly, but who knows what is being used by incorporated modules (pymongo, etc..) Trying to create a stripped down test case... I filed bug here with some updates: https://bitbucket.org/pypy/pypy/issues/2982/continual-increase-in-memory-uti... Reproduces in pypy, pypy without JIT but *does not* reproduce in CPython so there is some indications there Once I get the stripped down test case that is not tied in to our entire codebase -- I will publish it in the bug if I can and try other PyPy versions as well. Rob On Thu, Mar 28, 2019 at 10:56 AM Antonio Cuni <anto.cuni@gmail.com> wrote:
Hi Robert, are you using any package which relies on cpyext? I.e., modules written in C and/or with Cython (cffi is fine). IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext objects. So if you have a cycle which does e.g. Py_foo -> C_bar -> Py_foo (where Py_foo is a pure-python object and C_bar a cpyext object) they will never be collected unless you break the cycle manually.
Other than that: have you tried running it with PyPy 7.0 and/or 7.1?
On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher < robert.whitcher@rubrik.com> wrote:
So I have a process that use PyPy and pymongo in a loop. It does basically the same thing every loop, which query a table in via pymongo and do a few non-save calculations and then wait and loop again
The RSS of the process continually increased (the PYPY_GC_MAX is set pretty high). So I hooked in the GC stats output per: http://doc.pypy.org/en/latest/gc_info.html I also assure that gc.collect() was called at least every 3 minutes.
What I see is that... The memory while high is fair constant for a long time:
2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736 ... 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
Then it decides (an the exact per-loop behavior is the same each time) to chew up much more memory:
2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
That's 140 MB .... Where is all that memory going... What's more is that the PyPy GC stats do not show anything different:
Here are the GC stats from GC-Complete when we were at *144MB*:
2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 56.8MB (peak: 69.6MB) in arenas: 39.3MB rawmalloced: 14.5MB nursery: 3.0MB raw assembler used: 521.6kB ----------------------------- Total: 57.4MB
Total memory allocated: GC allocated: 63.0MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 1.0MB ----------------------------- Total: 64.0MB
Here are the GC stats from GC-Complete when we are at *285MB*:
2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 57.5MB (peak: 69.6MB) in arenas: 39.9MB rawmalloced: 14.6MB nursery: 3.0MB raw assembler used: 1.5MB ----------------------------- Total: 58.9MB
Total memory allocated: GC allocated: 63.1MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 2.0MB ----------------------------- Total: 65.1MB
How is this possible?
I am measuring RSS with:
def get_rss_mem_usage(): ''' Get the RSS memory usage in bytes @return: memory size in bytes; -1 if error occurs ''' try: process = psutil.Process(os.getpid()) return process.get_memory_info().rss except: return -1
And cross referencing with "ps -orss -p <pid>" and the RSS values reported are correct....
I cannot figure out where to go from here with this as it appears that PyPy is leaking this memory somehow... And I have no idea howto proceed from here... I end up having memory problems and getting Memory Warnings for a process that just loops and queries via pymongo Pymongo version is 3.7.1
This is:
Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, 08:21:35) [PyPy 6.0.0 with GCC 7.2.0]
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
Tried deleting the pymongo._cmessage module? It’s a cpyext one, so seems like a good place to start. I think it works without it. A general bisection memory leak finding technique that works in 11 runs or under... Run the code under coverage, so you have the lines that are actually called. Now comment out half the code so that it still runs. Is there a memory leak still? Look at the uncommented lines. No leak? Look at the comment lines. Keep doing it until you spot the problem, or you only have 1 line left. A common resource issue with pypy is how it does cleanup differently. Some code relies on destructors and reference counting to clean things up. Maybe check things like the number of open sockets and files? objgraph is reeeeeealy good IMHO (haven’t tried it with pypy, not sure it works) https://mg.pov.lt/objgraph/ On Thursday, March 28, 2019, Robert Whitcher <robert.whitcher@rubrik.com> wrote:
Thanks Antonio...
Relative to cpyext... I am not sure. We are not directly, but who knows what is being used by incorporated modules (pymongo, etc..) Trying to create a stripped down test case... I filed bug here with some updates: https://bitbucket.org/pypy/pypy/issues/2982/continual- increase-in-memory-utilization
Reproduces in pypy, pypy without JIT but *does not* reproduce in CPython so there is some indications there
Once I get the stripped down test case that is not tied in to our entire codebase -- I will publish it in the bug if I can and try other PyPy versions as well.
Rob
On Thu, Mar 28, 2019 at 10:56 AM Antonio Cuni <anto.cuni@gmail.com> wrote:
Hi Robert, are you using any package which relies on cpyext? I.e., modules written in C and/or with Cython (cffi is fine). IIRC, at the moment PyPy doesn't detect GC cycles which involve cpyext objects. So if you have a cycle which does e.g. Py_foo -> C_bar -> Py_foo (where Py_foo is a pure-python object and C_bar a cpyext object) they will never be collected unless you break the cycle manually.
Other than that: have you tried running it with PyPy 7.0 and/or 7.1?
On Thu, Mar 28, 2019 at 8:35 AM Robert Whitcher < robert.whitcher@rubrik.com> wrote:
So I have a process that use PyPy and pymongo in a loop. It does basically the same thing every loop, which query a table in via pymongo and do a few non-save calculations and then wait and loop again
The RSS of the process continually increased (the PYPY_GC_MAX is set pretty high). So I hooked in the GC stats output per: http://doc.pypy.org/en/ latest/gc_info.html I also assure that gc.collect() was called at least every 3 minutes.
What I see is that... The memory while high is fair constant for a long time:
2019-03-27 00:04:10.033-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144244736 ... 2019-03-27 01:01:46.841-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144420864 2019-03-27 01:02:36.943-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 144269312
Then it decides (an the exact per-loop behavior is the same each time) to chew up much more memory:
2019-03-27 01:04:17.184-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 145469440 2019-03-27 01:05:07.305-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 158175232 2019-03-27 01:05:57.401-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 173191168 2019-03-27 01:06:47.490-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 196943872 2019-03-27 01:07:37.575-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 205406208 2019-03-27 01:08:27.659-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 254562304 2019-03-27 01:09:17.770-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 256020480 2019-03-27 01:10:07.866-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 289779712
That's 140 MB .... Where is all that memory going... What's more is that the PyPy GC stats do not show anything different:
Here are the GC stats from GC-Complete when we were at *144MB*:
2019-03-26 23:55:49.127-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 140632064 2019-03-26 23:55:49.133-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 56.8MB (peak: 69.6MB) in arenas: 39.3MB rawmalloced: 14.5MB nursery: 3.0MB raw assembler used: 521.6kB ----------------------------- Total: 57.4MB
Total memory allocated: GC allocated: 63.0MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 1.0MB ----------------------------- Total: 64.0MB
Here are the GC stats from GC-Complete when we are at *285MB*:
2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:304): INFO_FLUSH: RSS: 285147136 2019-03-27 01:42:41.751-0600 [-] main_thread(29621)log (async_worker_process.py:308): DBG0: Total memory consumed: GC used: 57.5MB (peak: 69.6MB) in arenas: 39.9MB rawmalloced: 14.6MB nursery: 3.0MB raw assembler used: 1.5MB ----------------------------- Total: 58.9MB
Total memory allocated: GC allocated: 63.1MB (peak: 71.2MB) in arenas: 43.9MB rawmalloced: 22.7MB nursery: 3.0MB raw assembler allocated: 2.0MB ----------------------------- Total: 65.1MB
How is this possible?
I am measuring RSS with:
def get_rss_mem_usage(): ''' Get the RSS memory usage in bytes @return: memory size in bytes; -1 if error occurs ''' try: process = psutil.Process(os.getpid()) return process.get_memory_info().rss except: return -1
And cross referencing with "ps -orss -p <pid>" and the RSS values reported are correct....
I cannot figure out where to go from here with this as it appears that PyPy is leaking this memory somehow... And I have no idea howto proceed from here... I end up having memory problems and getting Memory Warnings for a process that just loops and queries via pymongo Pymongo version is 3.7.1
This is:
Python 2.7.13 (ab0b9caf307db6592905a80b8faffd69b39005b8, Apr 30 2018, 08:21:35) [PyPy 6.0.0 with GCC 7.2.0]
_______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev
participants (3)
-
Antonio Cuni
-
René Dudfield
-
Robert Whitcher