[Python-Dev] Rattlesnake progress

Tim Peters tim.one@comcast.net
Wed, 20 Feb 2002 02:46:07 -0500


[Tim]
>> Within a basic block, "the obvious" greedy scheme is ...

[Neil Schemenauer]
> I already had this part of the plan mostly figured out.  Thanks for
> verifying my thinking however.

You're welcome.  Note that you've been pretty mysterious about what it is
you do want to know, so I'm more pleased that I channeled you than that I
was slightly helpful <wink>.

>> What's hard (too expensive to be practical, in general) is provably
>> minimizing the overall number of temps *across* basic blocks.

> This was the part that was worrying me.

It can worry you later just as well.  Python isn't C, and the dynamic
semantics make it very much harder to prove that a subexpression is, e.g., a
loop invariant, or that an instance of "+" won't happen to change the
binding of every global, etc (ha! now I see you pointed that out yourself
later -- good chanelling on your end too <wink>).  For that reason there's
less need to get gonzo at the start.  IIRC, the primary motivation for
Rattlesnake was to cut eval loop overhead, and it's enough to aim for just
that much at the start.

>> If you're retaining Rattlesnake's idea of treating "the register file"
>> as a superset of the local vrbl vector, mounds of such "stores" will
>> be nops.

> I'm planning to keep this idea.  There seems to be no good reason to
> treat local variables any differently than registers.

Not now.  If you're looking at going on to generate native machine code
someday, well, this isn't the only simplification that will bite.

> I suppose it would be fairly easy to add a simple peep-hole optimizer
> that would clean out the redundant stores.  When you talked about
> flexible intermediate code did you have anything in mind?

That's why I urged you to keep it in Python at the start.  IRs come in all
sorts of flavors, and AFAICT people more or less stumble into one that works
well for them based on what they've done so far (and that was my experience
too in my previous lives).  You have to expect to rework it as ambitions
gorw.  Note Daniel Berlin's helpful comment that gcc is moving toward 3 IRs
now; that's the way these things always go.

At the start, I expect I'd represent a Python basic block as an object
containing a plain Python list of instruction objects.  Then you've got the
whole universe of zippy Python builtin list ops to build on when mutating
the instruction stream.  Note that my focus is to minimize *your* time
wrestling with this stuff:  implementing fancy data structures is a waste of
time at the start.  I'd also be delighted to let cyclic gc clean up dead
graph structures, so wouldn't spend any time trying, e.g., to craft a
gimmick out of weakrefs to avoid hard cycles.

You may or may not want to do a survey of popular IRs.  Here's a nice
*brief* page with links to follow:

    http://www.math.tau.ac.il/~guy/pa/ir.html

I think a lot of this is a matter of taste.  For example, some people swear
by the "Value Dependence Graphs" that came out of Microsoft Research.  I
never liked it, and it's hard to explain why ("too complicated").  Static
Single Assignment form is more to my tastes, but again it's hard to explain
why ("almost but not quite too simple").

Regardless, you can get a lot done at the Rattlesnake level just following
your gut intuitions, prodded by what turns out to be too clumsy.  As with
most other things, reading papers is much more useful after you've wrestled
with the problems on your own.  There's only one way to do it, but what that
way is remains a mystery to me.

> ...
> But is there really any freedom to do reordering?  For example, a
> BINARY_ADD opcode to end up doing practically anything.

That's right, and that's why you should gently file away but ignore almost
all the advice you'll get <wink>.  Skip kept discovering over and over again
just how little his peephole optimizer could get away with doing on Python
bytecode.  Collapsing jumps to jumps is one of the few safe things you can
get away with.

BTW, the JUMP_IF_TRUE and JUMP_IF_FALSE opcodes peek at the top of the stack
 instead of consuming it.  As a result, both the fall-through and target
instructions are almost always the no-bang-for-the-buck POP_TOP.  This
always irritated me to death (it's *useful* behavior, IIRC, only for chained
comparisons).  If Rattlesnake cleans up just that much, it will be a huge
win in my eyes, depite not being measurable <ahem>.