From jeremy@zope.com  Thu Jan 23 16:20:35 2003
From: jeremy@zope.com (Jeremy Hylton)
Date: Thu, 23 Jan 2003 11:20:35 -0500
Subject: [Compiler-sig] Re: [pypy-dev] Re: Notes on compiler package
In-Reply-To: <200301231449.h0NEnQ506396@odiug.zope.com>
References: <200301230021.h0N0Ld2p012793@overload3.baremetal.com>
 <2miswgrtyl.fsf@starship.python.net>
 <006d01c2c2df$3cdce400$6d94fea9@newmexico>
 <2md6mo6m6x.fsf@starship.python.net>
 <200301231449.h0NEnQ506396@odiug.zope.com>
Message-ID: <15920.5715.329448.372928@slothrop.zope.com>

I've been intending to post some comments in this thread for quite a
while, but both Zope and kids have been keeping my busy <wink>.

The compiler package is pretty functional.  In Tools/compiler there is
a regrtest.py script that compiles the entire standard library with
the compiler package and runs the test suite.  There are a a failure
related to a bug in the builtin compiler (improper handling of the
unary negative optimization), but otherwise the tests all run.

The package is more complex than I would like.  In particular, the
assembler phase that converts from abstract bytecode to concrete
bytecode is rather baroque.  But it is functional.

I believe the stack depth computation is still pretty bogus, although
Mark Hammond did a good job of getting it mostly correct.  The
difference between the two compilers is that the builtin compiler
tracks stack depth at the same time it emits bytecode and the compiler
package tries to determine the stack depth by scanning the bytecode in
a later pass.  The latter approach should probably compute stack depth
for each basic block and then do simple flow analysis (I think already
present) to determine what the max stack depth on any pass is.  Even
if the post-processing approach gets fixed, I'm not sure which
approach I like better.

As Samuele mentioned, there's an improved AST on the ast-branch and
it's already being used by Jython.  I don't recall the specific
differences off the top of my head, but the new AST has slightly
simpler data structures and is a better more regular.

The ast-branch still requires a lot of work to finish, although it is
functional enough to compile simple functions (definition and call).
The symbol table pass is much cleaner.  In general, there's less code
because the AST is easier to work with.  As a simplification, I
decided not to do anything to change the parser and instead focused
only on the backend.  I had grand plans of replacing the parser in a
future release, but we'll have to wait and see :-).

To summarize, briefly, what remains to be done on that branch
(although I'm not sure it's relevant to pypy-dev):

  - Conversion of concrete to abstract trees (90% done)
  - Error handling during conversion (basically not done)
  - Marshal API to pass AST between Python and C (30% done)
  - Basic bytecode generation (80% done)
  - Error checking (25% done)

It's probably a couple of weeks effort to get it to alpha quality.

The question I'd really like to ask, which I'll just throw out for
now, is: Why would minimal python want to generate bytecode for the
old interpreter?  It seems the a simpler bytecode format, e.g one that
didn't have a BINARY_ADD but generated approriate code to call
__add__, would be a better target.  There's a lot of complex C code
hiding inside BINARY_ADD, which ought to get pushed back out to the
Python code.  A simpler and slimmer bytecode format seems like it
would expose more opportunity for optimization.

Jeremy