Re: [Python-checkins] CVS: python/dist/src/Modules gcmodule.c,2.9,2.10

On Thu, Aug 31, 2000 at 09:01:59PM -0700, Jeremy Hylton wrote:
set the default threshold much higher we don't need to run gc frequently
Are you sure setting it that high (5000 as opposed to 100) is a good idea? Did you do any benchmarking? If with-gc is going to be on by default in 2.0 then I would agree with setting it high. If the GC is optional then I think it should be left as it is. People explicitly enabling the GC obviously have a problem with cyclic garbage. So, is with-gc going to be default? At this time I would vote no. Neil

"NS" == Neil Schemenauer <nascheme@enme.ucalgary.ca> writes:
NS> On Thu, Aug 31, 2000 at 09:01:59PM -0700, Jeremy Hylton wrote:
set the default threshold much higher we don't need to run gc frequently
NS> Are you sure setting it that high (5000 as opposed to 100) is a NS> good idea? Did you do any benchmarking? If with-gc is going to NS> be on by default in 2.0 then I would agree with setting it high. NS> If the GC is optional then I think it should be left as it is. NS> People explicitly enabling the GC obviously have a problem with NS> cyclic garbage. NS> So, is with-gc going to be default? At this time I would vote NS> no. For 2.0b1, it will be on by default, which is why I set the threshold so high. If we get a lot of problem reports, we can change either decision for 2.0 final. Do you disagree? If so, why? Even people who do have problems with cyclic garbage don't necessarily need a collection every 100 allocations. (Is my understanding of what the threshold measures correct?) This threshold causes GC to occur so frequently that it can happen during the *compilation* of a small Python script. Example: The code in Tools/compiler seems to have a cyclic reference problem, because it's memory consumption drops when GC is enabled. But the difference in total memory consumption with the threshold at 100 vs. 1000 vs. 5000 is not all that noticable, a few MB. Jeremy

On Fri, Sep 01, 2000 at 10:24:46AM -0400, Jeremy Hylton wrote:
It collects every net threshold0 allocations. If you create and delete 1000 container objects in a loop then no collection would occur.
But the difference in total memory consumption with the threshold at 100 vs. 1000 vs. 5000 is not all that noticable, a few MB.
The last time I did benchmarks with PyBench and pystone I found that the difference between threshold0 = 100 and threshold0 = 0 (ie. infinity) was small. Remember that the collector only counts container objects. Creating a thousand dicts with lots of non-container objects inside of them could easily cause an out of memory situation. Because of the generational collection usually only threshold0 objects are examined while collecting. Thus, setting threshold0 low has the effect of quickly moving objects into the older generations. Collection is quick because only a few objects are examined. A portable way to find the total allocated memory would be nice. Perhaps Vladimir's malloc will help us here. Alternatively we could modify PyCore_MALLOC to keep track of it in a global variable. I think collecting based on an increase in the total allocated memory would work better. What do you think? More benchmarks should be done too. Your compiler would probably be a good candidate. I won't have time today but maybe tonight. Neil

Neil Schemenauer wrote:
A few megabytes? Phew! Jeremy -- more power mem to you! I agree with Neil. 5000 is too high and the purpose of the inclusion of the collector in the beta is precisely to exercise it & get feedback! With a threshold of 5000 you've almost disabled the collector, leaving us only with the memory overhead and the slowdown <wink>. In short, bring it back to something low, please. [Neil]
A portable way to find the total allocated memory would be nice. Perhaps Vladimir's malloc will help us here.
Yep, the mem profiler. The profiler currently collects stats if enabled. This is slow and unusable in production code. But if the profiler is disabled, Python runs at full speed. However, the profiler will include an interface which will ask the mallocs on how much real mem they manage. This is not implemented yet... Maybe the real mem interface should go in a separate 'memory' module; don't know yet. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Vladimir Marangozov wrote:
I am happy to bring it to a lower number, but not as low as it was. I increased it forgetting that it was net allocations and not simply allocations. Of course, it's not exactly net allocations because if deallocations occur while the count is zero, they are ignored. My reason for disliking the previous lower threshold is that it causes frequently collections, even in programs that produce no cyclic garbage. I understand the garbage collector to be a supplement to the existing reference counting mechanism, which we expect to work correctly for most programs. The benefit of collecting the cyclic garbage periodically is to reduce the total amount of memory the process uses, by freeing some memory to be reused by malloc. The specific effect on process memory depends on the program's high-water mark for memory use and how much of that memory is consumed by cyclic trash. (GC also allows finalization to occur where it might not have before.) In one test I did, the difference between the high-water mark for a program that run with 3000 GC collections and 300 GC collections was 13MB and 11MB, a little less than 20%. The old threshold (100 net allocations) was low enough that most scripts run several collections during compilation of the bytecode. The only containers created during compilation (or loading .pyc files) are the dictionaries that hold constants. If the GC is supplemental, I don't believe its threshold should be set so low that it runs long before any cycles could be created. The default threshold can be fairly high, because a program that has problems caused by cyclic trash can set the threshold lower or explicitly call the collector. If we assume these programs are less common, there is no reason to make all programs suffer all of the time. I have trouble reasoning about the behavior of the pseudo-net allocations count, but think I would be happier with a higher threshold. I might find it easier to understand if the count where of total allocations and deallocations, with GC occurring every N allocation events. Any suggestions about what a more reasonable value would be and why it is reasonable? Jeremy

[Neil and Vladimir say a threshold of 5000 is too high!] [Jeremy says a threshold of 100 is too low!] [merriment ensues]
There's not going to be consensus on this, as the threshold is a crude handle on a complex problem. That's sure better than *no* handle, but trash behavior is so app-specific that there simply won't be a killer argument. In cases like this, the geometric mean of the extreme positions is always the best guess <0.8 wink>:
So 9 times out of 10 we can run it with a threshold of 707, and 1 out of 10 with 708 <wink>. Tuning strategies for gc *can* get as complex as OS scheduling algorithms, and for the same reasons: you're in the business of predicting the future based on just a few neurons keeping track of gross summaries of what happened before. A program can go through many phases of quite different behavior over its life (like I/O-bound vs compute-bound, or cycle-happy vs not), and at the phase boundaries past behavior is worse than irrelevant (it's actively misleading). So call it 700 for now. Or 1000. It's a bad guess at a crude heuristic regardless, and if we avoid extreme positions we'll probably avoid doing as much harm as we *could* do <0.9 wink>. Over time, a more interesting measure may be how much cyclic trash collections actually recover, and then collect less often the less trash we're finding (ditto more often when we're finding more). Another is like that, except replace "trash" with "cycles (whether trash or not)". The gross weakness of "net container allocations" is that it doesn't directly measure what this system was created to do. These things *always* wind up with dynamic measures, because static ones are just too crude across apps. Then the dynamic measures fail at phase boundaries too, and more gimmicks are added to compensate for that. Etc. Over time it will get better for most apps most of the time. For now, we want *both* to exercise the code in the field and not waste too much time, so hasty compromise is good for the beta. let-a-thousand-thresholds-bloom-ly y'rs - tim

Tim Peters wrote:
There's not going to be consensus on this, as the threshold is a crude handle on a complex problem.
Hehe. Tim gets philosophic again <wink>
Right on target, Tim! It is well known that the recent past is the best approximation of the near future and that the past as a whole is the only approximation we have at our disposal of the long-term future. If you add to that axioms like "memory management schemes influence the OS long-term scheduler", "the 50% rule applies for all allocation strategies", etc., it is clear that if we want to approach the optimum, we definitely need to adjust the collection frequency according to some proportional scheme. But even without saying this, your argument about dynamic GC thresholds is enough to put Neil into a state of deep depression regarding the current GC API <0.9 wink>. Now let's be pragmatic: it is clear that the garbage collector will make it for 2.0 -- be it enabled or disabled by default. So let's stick to a compromise: 500 for the beta, 1000 for the final release. This somewhat complies to your geometric calculus which mainly aims at balancing the expressed opinions. It certainly isn't fond regarding any existing theory or practice, and we all realized that despite the impressive math.sqrt() <wink>. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

"NS" == Neil Schemenauer <nascheme@enme.ucalgary.ca> writes:
NS> On Thu, Aug 31, 2000 at 09:01:59PM -0700, Jeremy Hylton wrote:
set the default threshold much higher we don't need to run gc frequently
NS> Are you sure setting it that high (5000 as opposed to 100) is a NS> good idea? Did you do any benchmarking? If with-gc is going to NS> be on by default in 2.0 then I would agree with setting it high. NS> If the GC is optional then I think it should be left as it is. NS> People explicitly enabling the GC obviously have a problem with NS> cyclic garbage. NS> So, is with-gc going to be default? At this time I would vote NS> no. For 2.0b1, it will be on by default, which is why I set the threshold so high. If we get a lot of problem reports, we can change either decision for 2.0 final. Do you disagree? If so, why? Even people who do have problems with cyclic garbage don't necessarily need a collection every 100 allocations. (Is my understanding of what the threshold measures correct?) This threshold causes GC to occur so frequently that it can happen during the *compilation* of a small Python script. Example: The code in Tools/compiler seems to have a cyclic reference problem, because it's memory consumption drops when GC is enabled. But the difference in total memory consumption with the threshold at 100 vs. 1000 vs. 5000 is not all that noticable, a few MB. Jeremy

On Fri, Sep 01, 2000 at 10:24:46AM -0400, Jeremy Hylton wrote:
It collects every net threshold0 allocations. If you create and delete 1000 container objects in a loop then no collection would occur.
But the difference in total memory consumption with the threshold at 100 vs. 1000 vs. 5000 is not all that noticable, a few MB.
The last time I did benchmarks with PyBench and pystone I found that the difference between threshold0 = 100 and threshold0 = 0 (ie. infinity) was small. Remember that the collector only counts container objects. Creating a thousand dicts with lots of non-container objects inside of them could easily cause an out of memory situation. Because of the generational collection usually only threshold0 objects are examined while collecting. Thus, setting threshold0 low has the effect of quickly moving objects into the older generations. Collection is quick because only a few objects are examined. A portable way to find the total allocated memory would be nice. Perhaps Vladimir's malloc will help us here. Alternatively we could modify PyCore_MALLOC to keep track of it in a global variable. I think collecting based on an increase in the total allocated memory would work better. What do you think? More benchmarks should be done too. Your compiler would probably be a good candidate. I won't have time today but maybe tonight. Neil

Neil Schemenauer wrote:
A few megabytes? Phew! Jeremy -- more power mem to you! I agree with Neil. 5000 is too high and the purpose of the inclusion of the collector in the beta is precisely to exercise it & get feedback! With a threshold of 5000 you've almost disabled the collector, leaving us only with the memory overhead and the slowdown <wink>. In short, bring it back to something low, please. [Neil]
A portable way to find the total allocated memory would be nice. Perhaps Vladimir's malloc will help us here.
Yep, the mem profiler. The profiler currently collects stats if enabled. This is slow and unusable in production code. But if the profiler is disabled, Python runs at full speed. However, the profiler will include an interface which will ask the mallocs on how much real mem they manage. This is not implemented yet... Maybe the real mem interface should go in a separate 'memory' module; don't know yet. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Vladimir Marangozov wrote:
I am happy to bring it to a lower number, but not as low as it was. I increased it forgetting that it was net allocations and not simply allocations. Of course, it's not exactly net allocations because if deallocations occur while the count is zero, they are ignored. My reason for disliking the previous lower threshold is that it causes frequently collections, even in programs that produce no cyclic garbage. I understand the garbage collector to be a supplement to the existing reference counting mechanism, which we expect to work correctly for most programs. The benefit of collecting the cyclic garbage periodically is to reduce the total amount of memory the process uses, by freeing some memory to be reused by malloc. The specific effect on process memory depends on the program's high-water mark for memory use and how much of that memory is consumed by cyclic trash. (GC also allows finalization to occur where it might not have before.) In one test I did, the difference between the high-water mark for a program that run with 3000 GC collections and 300 GC collections was 13MB and 11MB, a little less than 20%. The old threshold (100 net allocations) was low enough that most scripts run several collections during compilation of the bytecode. The only containers created during compilation (or loading .pyc files) are the dictionaries that hold constants. If the GC is supplemental, I don't believe its threshold should be set so low that it runs long before any cycles could be created. The default threshold can be fairly high, because a program that has problems caused by cyclic trash can set the threshold lower or explicitly call the collector. If we assume these programs are less common, there is no reason to make all programs suffer all of the time. I have trouble reasoning about the behavior of the pseudo-net allocations count, but think I would be happier with a higher threshold. I might find it easier to understand if the count where of total allocations and deallocations, with GC occurring every N allocation events. Any suggestions about what a more reasonable value would be and why it is reasonable? Jeremy

[Neil and Vladimir say a threshold of 5000 is too high!] [Jeremy says a threshold of 100 is too low!] [merriment ensues]
There's not going to be consensus on this, as the threshold is a crude handle on a complex problem. That's sure better than *no* handle, but trash behavior is so app-specific that there simply won't be a killer argument. In cases like this, the geometric mean of the extreme positions is always the best guess <0.8 wink>:
So 9 times out of 10 we can run it with a threshold of 707, and 1 out of 10 with 708 <wink>. Tuning strategies for gc *can* get as complex as OS scheduling algorithms, and for the same reasons: you're in the business of predicting the future based on just a few neurons keeping track of gross summaries of what happened before. A program can go through many phases of quite different behavior over its life (like I/O-bound vs compute-bound, or cycle-happy vs not), and at the phase boundaries past behavior is worse than irrelevant (it's actively misleading). So call it 700 for now. Or 1000. It's a bad guess at a crude heuristic regardless, and if we avoid extreme positions we'll probably avoid doing as much harm as we *could* do <0.9 wink>. Over time, a more interesting measure may be how much cyclic trash collections actually recover, and then collect less often the less trash we're finding (ditto more often when we're finding more). Another is like that, except replace "trash" with "cycles (whether trash or not)". The gross weakness of "net container allocations" is that it doesn't directly measure what this system was created to do. These things *always* wind up with dynamic measures, because static ones are just too crude across apps. Then the dynamic measures fail at phase boundaries too, and more gimmicks are added to compensate for that. Etc. Over time it will get better for most apps most of the time. For now, we want *both* to exercise the code in the field and not waste too much time, so hasty compromise is good for the beta. let-a-thousand-thresholds-bloom-ly y'rs - tim

Tim Peters wrote:
There's not going to be consensus on this, as the threshold is a crude handle on a complex problem.
Hehe. Tim gets philosophic again <wink>
Right on target, Tim! It is well known that the recent past is the best approximation of the near future and that the past as a whole is the only approximation we have at our disposal of the long-term future. If you add to that axioms like "memory management schemes influence the OS long-term scheduler", "the 50% rule applies for all allocation strategies", etc., it is clear that if we want to approach the optimum, we definitely need to adjust the collection frequency according to some proportional scheme. But even without saying this, your argument about dynamic GC thresholds is enough to put Neil into a state of deep depression regarding the current GC API <0.9 wink>. Now let's be pragmatic: it is clear that the garbage collector will make it for 2.0 -- be it enabled or disabled by default. So let's stick to a compromise: 500 for the beta, 1000 for the final release. This somewhat complies to your geometric calculus which mainly aims at balancing the expressed opinions. It certainly isn't fond regarding any existing theory or practice, and we all realized that despite the impressive math.sqrt() <wink>. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
participants (4)
-
Jeremy Hylton
-
Neil Schemenauer
-
Tim Peters
-
Vladimir.Marangozov@inrialpes.fr