[Python-Dev] Billions of gc's

Aahz aahz@pythoncraft.com
Tue, 30 Apr 2002 00:21:26 -0400

On Mon, Apr 29, 2002, Tim Peters wrote:
> [Aahz]
>> My take is that programs with a million live objects and no cycles are
>> common enough that gc should be designed to handle that smoothly.
> Well, millions of live objects is common but isn't a problem.  The glitch
> we're looking at it is surprising slowdown with millions of live *container*
> objects.  The latter isn't so common.
>> I don't think that a programmer casually writing such applications
>> (say, processing information from a database) should be expected to
>> understand gc well enough to tune it.
> People casually writing applications pushing the limits of their boxes are
> in for more surprises than just this <wink>.

Fair enough.  I hadn't quite understood that it was specifically
container objects, but obviously a database result will have lots of
tuples, so I think that's a good real-world metric for testing whatever
solution is proposed.

Here's a question: suppose we've got a database result with 10K rows (I'd
say that is fairly common), and we're processing each row with a regex
(something that can't be done in SQL).  What's a ballpark for gc overhead
before and after your fix?  (I'm still not set up to compile CVS, so I
can't do it myself.)
Aahz (aahz@pythoncraft.com)           <*>         http://www.pythoncraft.com/

"I used to have a .sig but I found it impossible to please everyone..."  --SFJ