a basic bytecode to machine code compiler

Rouslan Korneychuk rouslank at msn.com
Fri Apr 1 21:31:58 CEST 2011


Thanks for all the replies. I wasn't aware of some of these 
alternatives. Most of these seem to transform Python code/bytecode into 
another language. I was already well aware of Cython. On the Nuitka 
blog, I notice it says "Compiling takes a lot [sic] time, ...". Compyler 
seems to generate assembly and then parse the assembly to generate a 
Windows exe. Berp turns python into Haskell, not directly into machine code.

The closest thing to mine seems to be Psyco. It tries to do something 
more ambitious. It analyzes the program while it's running to create 
specialized versions of certain functions. High memory usage seems to be 
an issue with Psyco.

My approach is to simply translate the bytecode into raw machine code as 
directly as possible, quickly and without using much memory. Basically I 
was going for a solution with no significant drawbacks. It was also 
meant to be very easy to maintain. The machine code is generated with a 
series of functions that very closely mirrors AT&T syntax (same as the 
default syntax for the GNU assembler) with some convenience functions 
that make it look like some kind of high-level assembly. For example, 
here is the implementation for LOAD_GLOBAL:

@hasname
def _op_LOAD_GLOBAL(f,name):
     return (
         f.stack.push_tos(True) + [
         ops.push(address_of(name)),
         ops.push(GLOBALS)
     ] + call('PyDict_GetItem') +
         if_eax_is_zero([
             discard_stack_items(1),
             ops.push(BUILTINS)
         ] + call('PyDict_GetItem') +
             if_eax_is_zero([
                 discard_stack_items(1),
                 ops.push(pyinternals.raw_addresses[
                     'GLOBAL_NAME_ERROR_MSG']),
                 ops.push(pyinternals.raw_addresses['PyExc_NameError'])
             ] + call('format_exc_check_arg') + [
                 goto(f.end)
             ])
         ) + [
         discard_stack_items(2)
     ])

To make sense of it, you just need to ignore the square brackets and 
plus signs (they are there to create a list that gets joined into one 
byte string at the very end) and imagine it's assembly code (I should 
probably just write a variadic function or use operator overloading to 
make this syntactically clearer). Any time a new version of Python is 
released, you would just run diff on Python-X.X/Python/ceval.c and see 
which op codes need updating. You wouldn't need to make a broad series 
of changes because of a minor change in Python's syntax or bytecode.

And that's just one of the larger op code implementations. Here's the 
one for LOAD_CONST:

@hasconst
def _op_LOAD_CONST(f,const):
     return f.stack.push_tos() + [f.stack.push(address_of(const))]


Anyway, It was certainly interesting working on this. I'll probably at 
least implement looping and arithmetic so I can have something 
meaningful to benchmark.



More information about the Python-list mailing list