From jeremy@zope.com Thu Jan 23 16:20:35 2003 From: jeremy@zope.com (Jeremy Hylton) Date: Thu, 23 Jan 2003 11:20:35 -0500 Subject: [Compiler-sig] Re: [pypy-dev] Re: Notes on compiler package In-Reply-To: <200301231449.h0NEnQ506396@odiug.zope.com> References: <200301230021.h0N0Ld2p012793@overload3.baremetal.com> <2miswgrtyl.fsf@starship.python.net> <006d01c2c2df$3cdce400$6d94fea9@newmexico> <2md6mo6m6x.fsf@starship.python.net> <200301231449.h0NEnQ506396@odiug.zope.com> Message-ID: <15920.5715.329448.372928@slothrop.zope.com> I've been intending to post some comments in this thread for quite a while, but both Zope and kids have been keeping my busy . The compiler package is pretty functional. In Tools/compiler there is a regrtest.py script that compiles the entire standard library with the compiler package and runs the test suite. There are a a failure related to a bug in the builtin compiler (improper handling of the unary negative optimization), but otherwise the tests all run. The package is more complex than I would like. In particular, the assembler phase that converts from abstract bytecode to concrete bytecode is rather baroque. But it is functional. I believe the stack depth computation is still pretty bogus, although Mark Hammond did a good job of getting it mostly correct. The difference between the two compilers is that the builtin compiler tracks stack depth at the same time it emits bytecode and the compiler package tries to determine the stack depth by scanning the bytecode in a later pass. The latter approach should probably compute stack depth for each basic block and then do simple flow analysis (I think already present) to determine what the max stack depth on any pass is. Even if the post-processing approach gets fixed, I'm not sure which approach I like better. As Samuele mentioned, there's an improved AST on the ast-branch and it's already being used by Jython. I don't recall the specific differences off the top of my head, but the new AST has slightly simpler data structures and is a better more regular. The ast-branch still requires a lot of work to finish, although it is functional enough to compile simple functions (definition and call). The symbol table pass is much cleaner. In general, there's less code because the AST is easier to work with. As a simplification, I decided not to do anything to change the parser and instead focused only on the backend. I had grand plans of replacing the parser in a future release, but we'll have to wait and see :-). To summarize, briefly, what remains to be done on that branch (although I'm not sure it's relevant to pypy-dev): - Conversion of concrete to abstract trees (90% done) - Error handling during conversion (basically not done) - Marshal API to pass AST between Python and C (30% done) - Basic bytecode generation (80% done) - Error checking (25% done) It's probably a couple of weeks effort to get it to alpha quality. The question I'd really like to ask, which I'll just throw out for now, is: Why would minimal python want to generate bytecode for the old interpreter? It seems the a simpler bytecode format, e.g one that didn't have a BINARY_ADD but generated approriate code to call __add__, would be a better target. There's a lot of complex C code hiding inside BINARY_ADD, which ought to get pushed back out to the Python code. A simpler and slimmer bytecode format seems like it would expose more opportunity for optimization. Jeremy