[Python-Dev] Re: ... gcmodule.c,2.9,2.10

Jeremy Hylton jeremy@beopen.com
Sat, 2 Sep 2000 13:33:47 -0400


Vladimir Marangozov wrote:
>Neil Schemenauer wrote:
>>
>> On Fri, Sep 01, 2000 at 10:24:46AM -0400, Jeremy Hylton wrote:
>> > Even people who do have problems with cyclic garbage don't necessarily
>> > need a collection every 100 allocations.  (Is my understanding of what
>> > the threshold measures correct?)
>>
>> It collects every net threshold0 allocations.  If you create and delete
>> 1000 container objects in a loop then no collection would occur.
>>
>> > But the difference in total memory consumption with the threshold at
>> > 100 vs. 1000 vs. 5000 is not all that noticable, a few MB.
>
>A few megabytes?  Phew! Jeremy -- more power mem to you!
>I agree with Neil. 5000 is too high and the purpose of the inclusion
>of the collector in the beta is precisely to exercise it & get feedback!
>With a threshold of 5000 you've almost disabled the collector, leaving us
>only with the memory overhead and the slowdown <wink>.
>
>In short, bring it back to something low, please.

I am happy to bring it to a lower number, but not as low as it was.  I
increased it forgetting that it was net allocations and not simply
allocations.  Of course, it's not exactly net allocations because if
deallocations occur while the count is zero, they are ignored.

My reason for disliking the previous lower threshold is that it causes
frequently collections, even in programs that produce no cyclic garbage.  I
understand the garbage collector to be a supplement to the existing
reference counting mechanism, which we expect to work correctly for most
programs.

The benefit of collecting the cyclic garbage periodically is to reduce the
total amount of memory the process uses, by freeing some memory to be reused
by malloc.  The specific effect on process memory depends on the program's
high-water mark for memory use and how much of that memory is consumed by
cyclic trash.  (GC also allows finalization to occur where it might not have
before.)

In one test I did, the difference between the high-water mark for a program
that run with 3000 GC collections and 300 GC collections was 13MB and 11MB,
a little less than 20%.

The old threshold (100 net allocations) was low enough that most scripts run
several collections during compilation of the bytecode.  The only containers
created during compilation (or loading .pyc files) are the dictionaries that
hold constants.  If the GC is supplemental, I don't believe its threshold
should be set so low that it runs long before any cycles could be created.

The default threshold can be fairly high, because a program that has
problems caused by cyclic trash can set the threshold lower or explicitly
call the collector.  If we assume these programs are less common, there is
no reason to make all programs suffer all of the time.

I have trouble reasoning about the behavior of the pseudo-net allocations
count, but think I would be happier with a higher threshold.  I might find
it easier to understand if the count where of total allocations and
deallocations, with GC occurring every N allocation events.

Any suggestions about what a more reasonable value would be and why it is
reasonable?

Jeremy