[Python-Dev] Re: SET_LINENO killer
19 Aug 2002 10:39:05 +0100
Tim Peters <firstname.lastname@example.org> writes:
> [Michael Hudson]
> > ...
> > This makes no sense; after you've commented out the trace stuff, the
> > only difference left is that the switch is smaller!
> When things like this don't make sense, it just means we're naive <wink>.
> The eval loop overwhelms most optimizers via a crushing overload of "too
> many" variables and "too many" basic blocks connected via a complex
> topology, and compiler optimization phases are in the business of using
> (mostly) linear-time heuristics to solve exponential-time optimization
> problems. IOW, the performance of the eval loop is as touchy as a
> heterosexual sailor coming off 2 years at sea, and there's no predicting
> what minor changes will do to speed. This has been observed repeatedly by
> everyone who has tried to speed it, across many platforms, and across a
> decade of staring at it: the eval loop is in unstable equilibrium on its
> best days.
I knew all this, but was still surprised by the magnitude of the slowdown.
> In the limit, the eval loop "should be" a little slower now under -O, just
> because we've added another test + taken-branch to the normal path. From
> that POV, your
> > FWIW gcc makes my patch a small win even with -O.
> is as much "a mystery" as why MSVC 6 hates it.
I wonder if some of the slow comes from repeatedly hauling the
threadstate into the cache. I guess wonderings like this are almost
> > Actually, there are some other changes, like always updating f->f_lasti,
> > and allocating 8 more bytes on the stack. Does commenting out the
> > definition of instr_lb & instr_ub make any difference?
> I'll try that on Tuesday, but don't hold your breath. It could be that I
> can get back all the loss by declaring tstate volatile -- or doing any other
> random thing <wink>.
> > ...
> > Does reading assembly give any clues? Not that I'd really expect
> > anyone to read all of the main loop...
> I will if it's important, but a good HW simulator is a better tool for this
> kind of thing, and in any case I doubt I can make enough time to do what
> would be needed to address this for real.
On linux there's cachegrind which comes with valgrind and might prove
helpful. But that only runs on linux, and I'm not sure I want to
explain the linux mystery, as it might go away :)
> > I'm baffled.
> Join the club -- we've held this invitation open for you for years <wink>.
Attempting PhD in mathematics is providing enough bafflement for this
schmuck, but thanks for the offer.
> > Perhaps you can put SET_LINENO back in for the Windows build
> > <1e-6 wink>.
> If it's an unfortunate I-cache conflict among heavily-hit code addresses
> (something a good HW simulator can tell you), that could actually solve it!
> Then anything that manages to move one of the colliding code chunks to a
> different address could yield "a mysterious speedup". These mysteries are
> only irritating when they work against you <wink>.
Well, quite. Lets send Julian Seward an email asking him if he wants
to port valgrind to Windows <wink>.
surely, somewhere, somehow, in the history of computing, at least
one manual has been written that you could at least remotely
attempt to consider possibly glancing at. -- Adam Rixey