Mailman 3 pypy GC on large objects Re: funding/popularity? - pypy-dev

pypy GC on large objects Re: funding/popularity?

René Dudfield

23 Dec 2010 23 Dec '10

3:33 p.m.

Hello, I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely. I think ctypes was started before __sizeof__ became available... so it seems many of it's methods are not updated yet. __sizeof__ is not mandatory, so many extensions have not been updated to support it yet. cheers, On Thu, Dec 23, 2010 at 11:14 AM, Armin Rigo wrote:

...

Hi Dima,

On Wed, Dec 22, 2010 at 11:21 PM, Dima Tisnek wrote:

...
--- Comment #4 from Boris Zbarsky (:bz) 2010-12-22 13:43:23 PST --- So what I see this page do, in horizontal mode, is create 17 canvases each of which is width="816" height="3587". That means that each of them has a backing store of 816*3587*4 = 11,707,968 bytes. So that's about 200MB of memory usage right there.

I have no idea why they feel a need for 17 huge canvases, but if they want them, that's how much memory they'll take...

That looks very similar to an issue with PyPy's own GC, in which ctypes.create_string_buffer() returns objects which tend to be GC'ed late. That's because the string buffer object in ctypes appears (to PyPy's GC) to be just a small object, even though it actually references a potentially large piece of raw memory. Similarly, my vague guess about the above is that the 17*11MB of memory are hold by 17 small objects which firefox's GC think don't need to be collected particularly aggressively.

A bientôt,

Armin. _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Show replies by date

Armin Rigo

23 Dec 23 Dec

3:38 p.m.

New subject: pypy GC on large objects Re: funding/popularity?

Hi René, On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield wrote:

...

I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely.

I think you are confusing levels. There is no way the GC can call some app-level Python method to get information about the objects it frees (and when would it even call it?). Remember that our GC is written at a level where it works for any interpreter for any language, not just Python. A bientôt, Armin.

Dima Tisnek

9:30 p.m.

New subject: pypy GC on large objects Re: funding/popularity?

Basically collecting this is hard: dict(a=range(9**9)) large list is referenced, the object that holds the only reference is small no matter how you look at it. I guess it gets harder still if there are many small live objects, as getting to this dict takes a while (easier in this simple case with generataional collector, O(n) in general case) On 23 December 2010 06:38, Armin Rigo wrote:

...

Hi René,

On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield wrote:

...
I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely.

I think you are confusing levels. There is no way the GC can call some app-level Python method to get information about the objects it frees (and when would it even call it?). Remember that our GC is written at a level where it works for any interpreter for any language, not just Python.

A bientôt,

Armin. _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Paolo Giarrusso

24 Dec 24 Dec

1:39 p.m.

New subject: pypy GC on large objects Re: funding/popularity?

On Thu, Dec 23, 2010 at 20:30, Dima Tisnek wrote:

...

Basically collecting this is hard:

dict(a=range(9**9))

large list is referenced, the object that holds the only reference is small no matter how you look at it. First, usually (in most GC-ed languages) you can collect the list before the dict. In PyPy, if finalizers are involved (is this the case here? That'd be surprising), this is no more true.

However, object size is not the point. For standard algorithms, the size of an object does not matter at all in deciding when it's collected - I already discussed this in my other email in this thread, and I noted what actually could happen in the examples described by Armin, and your examples show that it is a good property. A large object in the same heap can fill it up and trigger an earlier garbage collection. In general, if GC ran in the background (but it usually doesn't, and not in PyPy) it could make sense to free objects sooner or later, depending not on object size, but on "how much memory would be 'indirectly freed' by freeing this object". However, because of sharing, answering this question is too complex (it requires collecting data from the whole heap). Moreover, the whole thing makes no sense at all with usual, stop-the-world collectors: the app is stopped, then the whole young generation, or the whole heap, is collected, then the app is resumed. When separate heaps are involved (such as with ctypes, or with Large Object Spaces, which avoid using a copy collector for large objects), it is more complicated to ensure that the same property holds: you need to consider stats of all heaps to decide whether to trigger GC.

...

I guess it gets harder still if there are many small live objects, as getting to this dict takes a while (easier in this simple case with generataional collector, O(n) in general case)

Not sure what you mean; I can make sense of it (not fully) only with an incremental collector, and they are still used seldom (especially, not in PyPy). Best regards

...

On 23 December 2010 06:38, Armin Rigo wrote:

...
Hi René,

On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield wrote:

...
I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely.

I think you are confusing levels. There is no way the GC can call some app-level Python method to get information about the objects it frees (and when would it even call it?). Remember that our GC is written at a level where it works for any interpreter for any language, not just Python.

A bientôt,

Armin. _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Ben.Young＠sungard.com

4 Jan 4 Jan

10:40 a.m.

New subject: pypy GC on large objects Re: funding/popularity?

...

-----Original Message----- From: pypy-dev-bounces@codespeak.net [mailto:pypy-dev- bounces@codespeak.net] On Behalf Of Paolo Giarrusso Sent: 24 December 2010 11:39 To: Dima Tisnek Cc: PyPy Dev; Armin Rigo Subject: Re: [pypy-dev] pypy GC on large objects Re: funding/popularity?

On Thu, Dec 23, 2010 at 20:30, Dima Tisnek wrote:

...
Basically collecting this is hard:

dict(a=range(9**9))

large list is referenced, the object that holds the only reference is small no matter how you look at it. First, usually (in most GC-ed languages) you can collect the list before the dict. In PyPy, if finalizers are involved (is this the case here? That'd be surprising), this is no more true.

However, object size is not the point. For standard algorithms, the size of an object does not matter at all in deciding when it's collected - I already discussed this in my other email in this thread, and I noted what actually could happen in the examples described by Armin, and your examples show that it is a good property. A large object in the same heap can fill it up and trigger an earlier garbage collection.

In general, if GC ran in the background (but it usually doesn't, and not in PyPy) it could make sense to free objects sooner or later, depending not on object size, but on "how much memory would be 'indirectly freed' by freeing this object". However, because of sharing, answering this question is too complex (it requires collecting data from the whole heap). Moreover, the whole thing makes no sense at all with usual, stop-the-world collectors: the app is stopped, then the whole young generation, or the whole heap, is collected, then the app is resumed.

When separate heaps are involved (such as with ctypes, or with Large Object Spaces, which avoid using a copy collector for large objects), it is more complicated to ensure that the same property holds: you need to consider stats of all heaps to decide whether to trigger GC.

...
I guess it gets harder still if there are many small live objects, as getting to this dict takes a while (easier in this simple case with generataional collector, O(n) in general case)

Not sure what you mean; I can make sense of it (not fully) only with an incremental collector, and they are still used seldom (especially, not in PyPy).

Best regards

...
On 23 December 2010 06:38, Armin Rigo wrote:

...
Hi René,

On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield wrote:

...
I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely.

I think you are confusing levels. There is no way the GC can call some app-level Python method to get information about the objects it frees (and when would it even call it?). Remember that our GC is written at a level where it works for any interpreter for any language, not just Python.

.NET supports calls to GC.AddMemoryPressure and GC.RemoveMemoryPressure to inform the GC you are allocating things outside of its knowledge. Maybe something similar would help? Cheers, Ben

...

...
...
A bientôt,

Armin. _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

_______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

-- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/ _______________________________________________ pypy-dev@codespeak.net http://codespeak.net/mailman/listinfo/pypy-dev

Paolo Giarrusso

4:50 p.m.

New subject: pypy GC on large objects Re: funding/popularity?

On Tue, Jan 4, 2011 at 09:40, wrote:

...

...
-----Original Message----- From: pypy-dev-bounces@codespeak.net [mailto:pypy-dev- bounces@codespeak.net] On Behalf Of Paolo Giarrusso Sent: 24 December 2010 11:39 To: Dima Tisnek Cc: PyPy Dev; Armin Rigo Subject: Re: [pypy-dev] pypy GC on large objects Re: funding/popularity?

On Thu, Dec 23, 2010 at 20:30, Dima Tisnek wrote:

...
Basically collecting this is hard:

dict(a=range(9**9))

large list is referenced, the object that holds the only reference is small no matter how you look at it. First, usually (in most GC-ed languages) you can collect the list before the dict. In PyPy, if finalizers are involved (is this the case here? That'd be surprising), this is no more true.

However, object size is not the point. For standard algorithms, the size of an object does not matter at all in deciding when it's collected - I already discussed this in my other email in this thread, and I noted what actually could happen in the examples described by Armin, and your examples show that it is a good property. A large object in the same heap can fill it up and trigger an earlier garbage collection.

In general, if GC ran in the background (but it usually doesn't, and not in PyPy) it could make sense to free objects sooner or later, depending not on object size, but on "how much memory would be 'indirectly freed' by freeing this object". However, because of sharing, answering this question is too complex (it requires collecting data from the whole heap). Moreover, the whole thing makes no sense at all with usual, stop-the-world collectors: the app is stopped, then the whole young generation, or the whole heap, is collected, then the app is resumed.

When separate heaps are involved (such as with ctypes, or with Large Object Spaces, which avoid using a copy collector for large objects), it is more complicated to ensure that the same property holds: you need to consider stats of all heaps to decide whether to trigger GC.

...
I guess it gets harder still if there are many small live objects, as getting to this dict takes a while (easier in this simple case with generataional collector, O(n) in general case)

Not sure what you mean; I can make sense of it (not fully) only with an incremental collector, and they are still used seldom (especially, not in PyPy).

Best regards

...
On 23 December 2010 06:38, Armin Rigo wrote:

...
Hi René,

On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield wrote:

...
I think this is a case where the object returned by ctypes.create_string_buffer() could use a correct __sizeof__ method return value. If pypy supported that, then the GC's could support extensions, and 'opaque' data structures in C too a little more nicely.

I think you are confusing levels. There is no way the GC can call some app-level Python method to get information about the objects it frees (and when would it even call it?). Remember that our GC is written at a level where it works for any interpreter for any language, not just Python.

.NET supports calls to GC.AddMemoryPressure and GC.RemoveMemoryPressure to inform the GC you are allocating things outside of its knowledge. Maybe something similar would help?

That's interesting as well. I and Armin discussed something similar in another branch of this thread, and he included that among planned ideas: http://codespeak.net/pipermail/pypy-dev/2010q4/006648.html http://codespeak.net/pipermail/pypy-dev/2010q4/006649.html The difference is that in my proposal one would hook the memory allocator for Python extensions, the .NET requires adding explicit calls to the source code. However, the key idea is that you might need to GC sooner if there is lots of unmanaged memory. Unfortunately, MSDN docs about those methods do not give pointers to the heuristics used: http://msdn.microsoft.com/en-us/library/system.gc.addmemorypressure.aspx http://msdn.microsoft.com/en-us/library/system.gc.removememorypressure.aspx Best regards -- Paolo Giarrusso - Ph.D. Student http://www.informatik.uni-marburg.de/~pgiarrusso/

Armin Rigo

25 Dec 25 Dec

8:41 p.m.

New subject: pypy GC on large objects Re: funding/popularity?

Hi René, On Thu, Dec 23, 2010 at 8:30 PM, Dima Tisnek wrote:

...

Basically collecting this is hard:

dict(a=range(9**9))

I think you missed the point of my original email. I was talking about GC-referenced objects that hold a reference to a large piece of memory allocated outside the GC. There is none here, and any GC (including PyPy's) will do a correct job in collecting this. A bientôt, Armin.

4859

Age (days ago)

4871

Last active (days ago)

List overview

Download

6 comments

5 participants

participants (5)

Armin Rigo
Ben.Young＠sungard.com
Dima Tisnek
Paolo Giarrusso
René Dudfield

pypy GC on large objects Re: funding/popularity?

René Dudfield

Armin Rigo

Dima Tisnek

Paolo Giarrusso

Ben.Young＠sungard.com

Paolo Giarrusso

Armin Rigo

tags

participants (5)