[pypy-dev] Notes on compiler package

Thu Jan 23 01:21:39 CET 2003

I'm finally starting to get my head wrapped around the compiler package,
and thought I'd pass some notes on to the group since it's a beast:

1) Other than the original AST generation, it looks like it's all pure
python (not tying into the C source).  The original AST comes from the
parser module, which is a pure wrapper module (although I've already
posted python code to build ast's here)

2) Internal CPython ASTs are transformed into "compiler"'s own ASTs. 
These nodes are smarter than the internal ones; they attach appropriate
info to "attributes".  CPython AST nodes just have a list of children,
this module identifies and breaks out important data (for example, the
arguments in a function def become an "args" attribute instead of
guessing it's the first child)

3) The CodeGenerators walk the ASTs and emit code into a code object.

4) Speed seems reasonable (which makes me think I'm missing some calls
into the C side of things)

4) The generated bytecode looks reasonably good and execs fine, but
there are still some diffences from what CPython generates.  Here's what
I've seen so far:
    SET_LINENO doesn't always work the same (this is how python knows
what line in a file threw an exception)
    Doesn't have one of the few optimizations the CPython compiler does
have.  CPython will throw out misplaced "docstrings"... strings in the
source that aren't assigned to anything.  This throws off array indexes
in the code objects co_const attribute.

In general, the compiler package is in alot better shape than I expected.

If anyone is interested in poking around, here's a quick script that
does basic comparison and diff on the bytecode generated by the builtin
compile and compiler's equivilent function.  I'll probably expand this
to compare the whole code objects in the near future.

===============
CompilerTest.py
===============
import sys
import compiler
import dis

def opcodeTuples(source):
    """
    Makes Opcode tuples in the form of: OFFSET, NAME, [OPTIONAL PARAM]
    """
    retVal = []
    a = iter(source)
    offset = 0
    def getByte(next=a.next): return ord(a.next())
    def getWord(next=a.next): return ord(a.next()) + ord(a.next()) * 256
# Little-endian
    while 1:
        try:
            opcode = getByte()
            opname = dis.opname[opcode]
            if opcode < 90:
                retVal.append( (offset, dis.opname[opcode]) )
                offset += 1
            else:
                retVal.append( (offset, dis.opname[opcode], getWord()))
                offset += 3
        except StopIteration:
            break
    return retVal

def opcodeDiff(ops1, ops2):
    """
    Does a simple DIFF of two sets of opcodes.
    Can only check one skipped line.
    Ignores param for now since they don't match
    """
    opcode1, opcode2 = opcodeTuples(ops1), opcodeTuples(ops2)
    a,b = 0,0
    print "%30s%30s" % ("FIRST", "SECOND")
    print "%30s%30s" % ("====================" ,"====================")
    while 1:
        if opcode1[a][1:2] == opcode2[b][1:2]:
            print "%30s%30s" % (opcode1[a], opcode2[b] ),
            if opcode1[a][2:] != opcode2[b][2:]:
                print "    ARG MISMATCH"
            else:
                print
            a += 1
            b += 1
        elif opcode1[a+1][1:2] == opcode2[b][1:2]:
            print "%30s%30s" % (opcode1[a], "<not here>")
            a += 1
        elif opcode1[a][1:2] == opcode2[b+1][1:2]:
            print "%30s%30s" % ("<not here>", opcode2[b])
            b += 1
        else:
            print "NONTRIVIAL DIFF%25s%25s" % (opcode1[a], opcode2[b])
            break
        if a >= len(opcode1) and b >= len(opcode2):
            break
        elif a >= len(opcode1) or b >= len(opcode2):
            print "UNEXPECTED END OF OPCODES"
            break

def compareCompiles(filename, nativeCompile=compile,
pythonCompile=compiler.compile):
    """
    Compares a bytecode compile between the native python compiler and
the one written in python
    """
    source = file(filename).read()

    native = nativeCompile(source, filename, "exec")
    python = pythonCompile(source, filename, "exec")
    if native.co_code == python.co_code:
        print "compiles matched"
    else:
        print "compiles didn't match"
        opcodeDiff(native.co_code, python.co_code)

if __name__ == "__main__":
    compareCompiles("c:\\python23\\lib\\code.py")

===============
End CompilerTest.py
===============

---------------------------------------
Get your free e-mail address @zworg.com