[Python-Dev] Proposal: Run GC less often
"Martin v. Löwis"
martin at v.loewis.de
Sat Jun 21 22:23:16 CEST 2008
Here is my proposal for making the GC run less often.
The objective is to avoid the quadratic-time behavior
if many objects are created and none of them is garbage.
The youngest generation remains collected after 700
allocations, the middle generation after 10 young collections.
Therefore, full GC remains considered every 7000 allocations.
The two youngest generations don't cause quadratic behavior,
as the number of objects in each generation is bounded.
Currently, full GC is invoked every 10 middle collections.
Under my proposal, 10 middle collections must have passed,
PLUS the number of survivor objects from the middle generation
must exceed 10% of the number of objects in the oldest
generation.
As a consequence, garbage collection becomes less frequent
as the number of objects on the heap grows, resulting in
an amortized O(N) overhead for garbage collection (under
the access pattern stated above).
To determine the number of survivor objects, counts are
taken during the collection. Objects deallocated through
reference counting still remain accounted for as survivors
(likewise for determining the size of the oldest generation).
I made a simulation assuming an initially-empty heap,
and invocations of the collector every M=7000 objects.
The table below is in units of M and shows the number
of objects allocated, and the number of objects inspected
since the start of the simulation, for the old and the
new scheme (the asterisk indicates that a collection
took place; lines with no collection are dropped):
total old_inspected new_inspected
10 10* 10*
20 30* 30*
30 60* 60*
40 100* 100*
50 150* 150*
60 210* 210*
70 280* 280*
80 360* 360*
90 450* 450*
100 550* 450
102 550 552*
110 660* 552
115 660 667*
120 780* 667
129 780 796*
130 910* 796
140 1050* 796
...
940 44650* 7724
942 44650 8666*
950 45600* 8666
960 46560* 8666
970 47530* 8666
980 48510* 8666
990 49500* 8666
1000 50500* 8666
...
9990 4995000* 95887
As you can see, the old and the new scheme would start
of equally until 100M objects have been allocated. The
table shows how the old scheme grows quadratically, and
the new scheme eventually approaches roughly a factor
of ten between the number of objects and the number of
times that GC considers an object.
Applications with a small number of objects will see no
change in behavior, also, applications that create
lots of small cycles will continue to see them collected
in the youngest or middle generations.
Please let me know what you think.
Regards,
Martin
P.S. I don't plan to implement this myself, so if you like
the idea, go ahead.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: comp.py
Type: text/x-python
Size: 719 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20080621/6bf90fed/attachment.py>
More information about the Python-Dev
mailing list