[Python-Dev] Re: ... gcmodule.c,2.9,2.10

Tim Peters tim_one@email.msn.com
Sat, 2 Sep 2000 14:20:18 -0400


[Neil and Vladimir say a threshold of 5000 is too high!]

[Jeremy says a threshold of 100 is too low!]

[merriment ensues]

> ...
> Any suggestions about what a more reasonable value would be and why
> it is reasonable?
>
> Jeremy

There's not going to be consensus on this, as the threshold is a crude handle on a complex
problem.  That's sure better than *no* handle, but trash behavior is so app-specific that
there simply won't be a killer argument.

In cases like this, the geometric mean of the extreme positions is always the best guess
<0.8 wink>:

>>> import math
>>> math.sqrt(5000 * 100)
707.10678118654755
>>>

So 9 times out of 10 we can run it with a threshold of 707, and 1 out of 10 with 708
<wink>.

Tuning strategies for gc *can* get as complex as OS scheduling algorithms, and for the
same reasons:  you're in the business of predicting the future based on just a few neurons
keeping track of gross summaries of what happened before.  A program can go through many
phases of quite different behavior over its life (like I/O-bound vs compute-bound, or
cycle-happy vs not), and at the phase boundaries past behavior is worse than irrelevant
(it's actively misleading).

So call it 700 for now.  Or 1000.  It's a bad guess at a crude heuristic regardless, and
if we avoid extreme positions we'll probably avoid doing as much harm as we *could* do
<0.9 wink>.  Over time, a more interesting measure may be how much cyclic trash
collections actually recover, and then collect less often the less trash we're finding
(ditto more often when we're finding more).  Another is like that, except replace "trash"
with "cycles (whether trash or not)".  The gross weakness of "net container allocations"
is that it doesn't directly measure what this system was created to do.

These things *always* wind up with dynamic measures, because static ones are just too
crude across apps.  Then the dynamic measures fail at phase boundaries too, and more
gimmicks are added to compensate for that.  Etc.  Over time it will get better for most
apps most of the time.  For now, we want *both* to exercise the code in the field and not
waste too much time, so hasty compromise is good for the beta.

let-a-thousand-thresholds-bloom-ly y'rs  - tim