[pypy-dev] LLAbstractInterpreter

Armin Rigo arigo at tunes.org
Thu Dec 15 15:37:30 CET 2005

Hi again,

On Wed, Dec 14, 2005 at 09:01:19PM +0100, Armin Rigo wrote:
> There are heuristics like "when an operation is constant-folded, it
> produces a concrete constant if at least one of the input arguments was
> already a concrete constant, and a little constant otherwise".  The
> problem with that approach is that it doesn't really work :-)  We can't
> seem to get the correct effect already in the case of the Toy Language.
> I will propose a plan for this in the next e-mail, to avoid mixing
> speculations with the presentation of what we have so far.

Of course it's unclear at the moment how well different solutions would
work, but here is one proposed approach...

The crucial place in the interp() function where we need to have a
constant is the 'opcode' variable, used in the chain of if/elif.  So
let's write it down explicitely as a hint:

    def interp(code):
        pc = 0
        code_len = len(code)
        stack = []
        while pc < code_len:
            opcode = ord(code[pc])
            pc += 1
            opcode = hint(opcode, concrete=True)    # hint!
            if opcode == PUSH:
                stack.append( ... )
                pc += 1
            elif opcode == POP:
            elif ...

The new function hint() would turn into an operation in the LL flow
graph, e.g.:

    v2 = hint(v1, Constant(dict_of_keywords))

It is mostly the same as the 'same_as' operation (:-), but it is
recognized by the LLAbstractInterpreter as meaning that we want to force
'v2' to be a constant.

Here are the proposed propagation rules.  For now, let's consider we
have LLVariables and LLConstants -- only one kind of constant, of the
"concrete" kind, i.e. forcing loop unrolling.  (There is no implicit
merging with generalization of constants to variables in this model.)

By default, mostly everything is LLVariables, just like they are
Variables in the input flow graph.  In the code sample above, this
includes 'pc' (but not 'code', for which we can pass an LLConstant
argument).  Then the 'opcode' that reaches the call to hint() is also a
LLVariable.  That's a problem: how to turn a variable into a concrete

The idea is to give up, but track back the origin of the variable.
Let's look at (part of) the LL graph of interp():

       (block 1)
           | (pc=0)

  ,--> (block 2)
  |      v1 = int_lt(pc, code_len)
  |      exitswitch v1
  |        |
  .        |  True
  .        V

       (block 3)
         v2 = getarrayitem(code, pc)
         v3 = cast_char_to_int(v2)
         v4 = hint(v3, ({'concrete': True}))

If v3 receives a LLVariable, then we track it back: it comes from an
operation involving 'v2', which itself involves 'code' and 'pc'.  Here
'code' is a LLConstant, but 'pc' comes from the previous block.  In the
path we followed entering this block, 'pc' comes from a constant in the
flow graph.  So far, the state at the start of block 2 looks like this:

   code:     LLConstant("...")
   code_len: LLConstant(len(code))
   pc:       LLVariable()

So the hint() operation would re-link the residual block corresponding
to "block 1" to a new residual "block 2" with a more precise state:

   code:     LLConstant("...")
   code_len: LLConstant(len(code))
   pc:       LLConstant(0)

And then we continue again from this new state, which should result
in 'v3' being an LLConstant this time.

The complete story probably still requires two kinds of constants:
hint() would return a "concrete" constant, but the 'pc' in the state
would only become a "little" constant.  I think that the original way
the LLAbstractInterpreter propagates them (before more rules were added)
would be just what we need here:

* a "concrete" constant cannot be merged;

* constant-folded operations produce a "concrete" constant if any input
  argument is a "concrete" constant;

* "little" constants become variables as soon as they go to the next
  block, so we don't have to worry about generalizing two little
  constants at a merge point.

The hint() operation would hack the state of block 2 to make 'pc' a
*little* constant, so that it stays a constant for the minimal amount of
time and doesn't propagate everywhere.  So after 'pc' has been forced to
a little constant in block 2, it would still revert to a variable in
block 3; but then the hint() would fail again, and this time it would
fix the state of block 3 so that 'pc' is a little constant in there as
well.  (We can think later about making this tedious step-by-step
constantification not too slow, if needed.)

For reference, an implicit goal I'm trying to address is to avoid having
constants propagate more than expected.  Currently, we get the bad
effect of TL instructions like "PUSH 42" storing a concrete constant in
the value stack, which prevents merges.  It is a concrete constant at
the moment because the value '42' is just the ord() of the next
character in the bytecode.  Under the new proposal, neither 'code' nor
'pc' would be concrete constants, so the 'ord(code[pc])' that retrieve
42 would not return a concrete constant either.  Only 'opcode' would be
a concrete constant, plus the result of the computations done directly
with this value -- which is just the results of the comparisons in the
if/elif statements.

I expect some amount of propagation of the little constants will soon be
needed again, with its automatic generalization logic, but that should
be reintroduced in a second step.


A bientot,


More information about the Pypy-dev mailing list