[Python-Dev] Re: Speed of test_sort.py

Guido van Rossum guido@python.org
Thu, 01 Aug 2002 14:16:04 -0400

> [Tim, in python-checkins]
> > Bizarre:  this takes 11x longer to run if and only if test_longexp is
> > run before it, on my box.  The bigger REPS is in test_longexp, the
> > slower this gets.  What happens on your box?  It's not gc on my box
> > (which is good, because gc isn't a plausible candidate here).
> > 
> > The slowdown is massive in the parts of test_sort that implicitly
> > invoke a new-style class's __lt__ or __cmp__ methods.  If I boost
> > REPS large enough in test_longexp, even the test_sort tests on an array
> > of size 64 visibly c-r-a-w-l.  The relative slowdown is even worse in
> > a debug build.  And if I reduce REPS in test_longexp, the slowdown in
> > test_sort goes away.
> > 
> > test_longexp does do horrid things to Win98's management of user
> > address space, but I thought I had made that a whole lot better a month
> > or so ago (by overallocating aggressively in the parser).
> It's about the same on my Linux box (system time is CPU time spent in
> the kernel):
> test_longexp alone takes 1.92 user + 0.22 system seconds.
> test_sort alone takes 1.71 user + 0.01 system seconds.
> test_sort + test_longexp takes 3.62 user + 0.18 system seconds.
> test_longexp + test_sort takes 38.05 user and 0.34 system seconds!!!
> I'll see if I can get this to run under a profiler.

The profiler shows that in the latter run, 86% of the time (39 seconds
-- the profiler slows things down :-) was spent in PyFrame_New, for
188923 calls.  I note that the longexp-only profile has only 593 calls
to that function, and the sort-only profile has 183075 calls to it,
but look only 0.39 seconds for those altogether!

The numbers don't quite add up to 188923 (it misses 5255), but it's
close enough, and the rest is probably because regrtest does extra
stuff when it runs two tests, or there's some randomness in the sort

So why would 180,000 calls to PyFrame_New take 38 seconds in one case
and 0.39 seconds in the other?  I checked the call tree in the
profiler output, and only very few of the calls to PyFrame_New call
something else (mostly PyType_IsSubtype and PyDict_GetItem), and
besides the two cases have an almost identical call profile.

Suggestion: doesn't test_longexp create some frames with a very large
number of local variables?  Then PyFrame_New could spend a lot of time
in this loop:

	while (--extras >= 0)
		f->f_localsplus[extras] = NULL;

There's a free list of frames, and PyFrame_New picks the first frame
on the free list.  It grows the space for locals if necessary, but it
never shrinks it.

Back to Tim -- does this make sense?  Should we attempt to fix it?

--Guido van Rossum (home page: http://www.python.org/~guido/)