pypy array module memory leak?

The following program has constant (+- 10 MB) memory usage in CPython, but it quickly leaks massive amounts of memory in pypy. It simply assigns one bit of an array to another slice. The length of the array remains constant (so there's nothing to delete from inside the program) pypy-1.5 (pypy-c-jit-43780-b590cf6de419-linux) import array import time a = array.array('i', 'a' * 100000) for i in range(1,100000): a[1:1000] = a[2001:3000] time.sleep(0.0001) # So you can go check mem usage. I apologise for the disclaimer The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of UCS Group and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited. If you are not the intended addressee please notify the writer immediately and destroy the e-mail. UCS Group Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Hi Berend, On Fri, Jul 22, 2011 at 12:42 PM, Berend De Schouwer <berend.deschouwer@ucs-software.co.za> wrote:
The following program has constant (+- 10 MB) memory usage in CPython, but it quickly leaks massive amounts of memory in pypy.
In theory it's not a leak, because the memory is eventually freed. You can see this by adding "import gc; gc.collect()" in the loop; then the memory usage remains stable. Of course in practice it's bad. The issue is that in order to trigger a collection, we count the size of the allocated objects --- but we ignore the fact that the temporary array.array created by a[2001:3000] has also an attached raw-malloced array of 1000 integers. Instead we just count the base size of the array object, which is 4 or 5 words. As a result we grossly misestimate the time at which we need to trigger the next collection. It needs to be fixed, but I'm not sure exactly how to do it generally (as opposed to just fixing array.array, which doesn't help all other similar situations). A bientôt, Armin.

Hi Armin, .NET has an AddMemoryPressure call for just this situation. Perhaps the PyPy GCs need a similar thing? Thanks, Ben -----Original Message----- From: pypy-dev-bounces+python=theyoungfamily.co.uk@python.org [mailto:pypy-dev-bounces+python=theyoungfamily.co.uk@python.org] On Behalf Of Armin Rigo Sent: 22 July 2011 12:20 To: Berend De Schouwer Cc: pypy-dev Subject: Re: [pypy-dev] pypy array module memory leak? Hi Berend, On Fri, Jul 22, 2011 at 12:42 PM, Berend De Schouwer <berend.deschouwer@ucs-software.co.za> wrote:
The following program has constant (+- 10 MB) memory usage in CPython, but it quickly leaks massive amounts of memory in pypy.
In theory it's not a leak, because the memory is eventually freed. You can see this by adding "import gc; gc.collect()" in the loop; then the memory usage remains stable. Of course in practice it's bad. The issue is that in order to trigger a collection, we count the size of the allocated objects --- but we ignore the fact that the temporary array.array created by a[2001:3000] has also an attached raw-malloced array of 1000 integers. Instead we just count the base size of the array object, which is 4 or 5 words. As a result we grossly misestimate the time at which we need to trigger the next collection. It needs to be fixed, but I'm not sure exactly how to do it generally (as opposed to just fixing array.array, which doesn't help all other similar situations). A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

On 22/07/2011 13:20, Armin Rigo wrote:
Confirmed. It's expensive, though, so I'm running it in a separate thread. The program in question can eat a few gigs in a few seconds, so I've got to run it every second.
I'd appreciate pointers to fixing array.array, if possible. gc.collect() every second still eats about 500 MB too much RAM. So I'm looking at 600 vs. 100 MB RAM. At least it's running. The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of UCS Group and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited. If you are not the intended addressee please notify the writer immediately and destroy the e-mail. UCS Group Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

I've replaced a = array.array() with a = [0] * 10000 and it's faster, and doesn't eat all the RAM. For those who care :) On 22/07/2011 12:42, Berend De Schouwer wrote:
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of UCS Group and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited. If you are not the intended addressee please notify the writer immediately and destroy the e-mail. UCS Group Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Hi Berend, I think I fixed the original problem too. See the longish checkin message of e7121092d73f. A bientôt, Armin.

Hi Berend, On Fri, Jul 22, 2011 at 12:42 PM, Berend De Schouwer <berend.deschouwer@ucs-software.co.za> wrote:
The following program has constant (+- 10 MB) memory usage in CPython, but it quickly leaks massive amounts of memory in pypy.
In theory it's not a leak, because the memory is eventually freed. You can see this by adding "import gc; gc.collect()" in the loop; then the memory usage remains stable. Of course in practice it's bad. The issue is that in order to trigger a collection, we count the size of the allocated objects --- but we ignore the fact that the temporary array.array created by a[2001:3000] has also an attached raw-malloced array of 1000 integers. Instead we just count the base size of the array object, which is 4 or 5 words. As a result we grossly misestimate the time at which we need to trigger the next collection. It needs to be fixed, but I'm not sure exactly how to do it generally (as opposed to just fixing array.array, which doesn't help all other similar situations). A bientôt, Armin.

Hi Armin, .NET has an AddMemoryPressure call for just this situation. Perhaps the PyPy GCs need a similar thing? Thanks, Ben -----Original Message----- From: pypy-dev-bounces+python=theyoungfamily.co.uk@python.org [mailto:pypy-dev-bounces+python=theyoungfamily.co.uk@python.org] On Behalf Of Armin Rigo Sent: 22 July 2011 12:20 To: Berend De Schouwer Cc: pypy-dev Subject: Re: [pypy-dev] pypy array module memory leak? Hi Berend, On Fri, Jul 22, 2011 at 12:42 PM, Berend De Schouwer <berend.deschouwer@ucs-software.co.za> wrote:
The following program has constant (+- 10 MB) memory usage in CPython, but it quickly leaks massive amounts of memory in pypy.
In theory it's not a leak, because the memory is eventually freed. You can see this by adding "import gc; gc.collect()" in the loop; then the memory usage remains stable. Of course in practice it's bad. The issue is that in order to trigger a collection, we count the size of the allocated objects --- but we ignore the fact that the temporary array.array created by a[2001:3000] has also an attached raw-malloced array of 1000 integers. Instead we just count the base size of the array object, which is 4 or 5 words. As a result we grossly misestimate the time at which we need to trigger the next collection. It needs to be fixed, but I'm not sure exactly how to do it generally (as opposed to just fixing array.array, which doesn't help all other similar situations). A bientôt, Armin. _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev

On 22/07/2011 13:20, Armin Rigo wrote:
Confirmed. It's expensive, though, so I'm running it in a separate thread. The program in question can eat a few gigs in a few seconds, so I've got to run it every second.
I'd appreciate pointers to fixing array.array, if possible. gc.collect() every second still eats about 500 MB too much RAM. So I'm looking at 600 vs. 100 MB RAM. At least it's running. The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of UCS Group and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited. If you are not the intended addressee please notify the writer immediately and destroy the e-mail. UCS Group Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

I've replaced a = array.array() with a = [0] * 10000 and it's faster, and doesn't eat all the RAM. For those who care :) On 22/07/2011 12:42, Berend De Schouwer wrote:
The contents of and attachments to this e-mail are intended for the addressee only, and may contain the confidential information of UCS Group and/or its subsidiaries. Any review, use or dissemination thereof by anyone other than the intended addressee is prohibited. If you are not the intended addressee please notify the writer immediately and destroy the e-mail. UCS Group Limited and its subsidiaries distance themselves from and accept no liability for unauthorised use of their e-mail facilities or e-mails sent other than strictly for business purposes.

Hi Berend, I think I fixed the original problem too. See the longish checkin message of e7121092d73f. A bientôt, Armin.
participants (3)
-
Armin Rigo
-
Ben.Young@sungard.com
-
Berend De Schouwer