[pypy-dev] pypy GC on large objects Re: funding/popularity?

Paolo Giarrusso p.giarrusso at gmail.com
Fri Dec 24 12:39:29 CET 2010


On Thu, Dec 23, 2010 at 20:30, Dima Tisnek <dimaqq at gmail.com> wrote:
> Basically collecting this is hard:
>
> dict(a=range(9**9))
>
> large list is referenced, the object that holds the only reference is
> small no matter how you look at it.
First, usually (in most GC-ed languages) you can collect the list
before the dict. In PyPy, if finalizers are involved (is this the case
here? That'd be surprising), this is no more true.

However, object size is not the point. For standard algorithms, the
size of an object does not matter at all in deciding when it's
collected - I already discussed this in my other email in this thread,
and I noted what actually could happen in the examples described by
Armin, and your examples show that it is a good property. A large
object in the same heap can fill it up and trigger an earlier garbage
collection.

In general, if GC ran in the background (but it usually doesn't, and
not in PyPy) it could make sense to free objects sooner or later,
depending not on object size, but on "how much memory would be
'indirectly freed' by freeing this object". However, because of
sharing, answering this question is too complex (it requires
collecting data from the whole heap). Moreover, the whole thing makes
no sense at all with usual, stop-the-world collectors: the app is
stopped, then the whole young generation, or the whole heap, is
collected, then the app is resumed.

When separate heaps are involved (such as with ctypes, or with Large
Object Spaces, which avoid using a copy collector for large objects),
it is more complicated to ensure that the same property holds: you
need to consider stats of all heaps to decide whether to trigger GC.

> I guess it gets harder still if there are many small live objects, as
> getting to this dict takes a while
> (easier in this simple case with generataional collector, O(n) in general case)

Not sure what you mean; I can make sense of it (not fully) only with
an incremental collector, and they are still used seldom (especially,
not in PyPy).

Best regards

> On 23 December 2010 06:38, Armin Rigo <arigo at tunes.org> wrote:
>> Hi René,
>>
>> On Thu, Dec 23, 2010 at 2:33 PM, René Dudfield <renesd at gmail.com> wrote:
>>> I think this is a case where the object returned by
>>> ctypes.create_string_buffer() could use a correct __sizeof__ method
>>> return value.  If pypy supported that, then the GC's could support
>>> extensions, and 'opaque' data structures in C too a little more
>>> nicely.
>>
>> I think you are confusing levels.  There is no way the GC can call
>> some app-level Python method to get information about the objects it
>> frees (and when would it even call it?).  Remember that our GC is
>> written at a level where it works for any interpreter for any
>> language, not just Python.
>>
>>
>> A bientôt,
>>
>> Armin.
>> _______________________________________________
>> pypy-dev at codespeak.net
>> http://codespeak.net/mailman/listinfo/pypy-dev
>>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev



-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/



More information about the Pypy-dev mailing list