Why is it impossible to create a compiler than can compile Python to machinecode like C?
steve+comp.lang.python at pearwood.info
Fri Mar 1 05:27:28 CET 2013
On Thu, 28 Feb 2013 12:25:07 -0800, kramer65 wrote:
> I'm using Python for a while now and I love it. There is just one thing
> I cannot understand. There are compilers for languages like C and C++.
> why is it impossible to create a compiler that can compile Python code
> to machinecode?
Your assumption is incorrect. You can compile Python to machine-code, at
least sometimes. It is quite tricky, for various reasons, but it can be
done, at various levels of efficiency.
One of the oldest such projects was Psyco, which was a Just-In-Time
compiler for Python. When Psyco was running, it would detect at run time
that you were doing calculations on (say) standard ints, compile on the
fly a machine-code function to perform those calculations, and execute
it. Psyco has more or less been made obsolete by PyPy, which does the
same thing only even more so.
> My reasoning is as follows:
> When GCC compiles a program written in C++, it simply takes that code
> and decides what instructions that would mean for the computer's
> hardware. What does the CPU need to do, what does the memory need to
> remember, etc. etc. If you can create this machinecode from C++, then I
> would suspect that it should also be possible to do this (without a
> C-step in between) for programs written in Python.
In principle, yes, but in practice it's quite hard, simply because Python
does so much more at runtime than C++ (in general).
Take an expression like:
x = a + b
In C++, the compiler knows what kind of data a and b are, what kind of
data x is supposed to be. They are often low-level machine types like
int32 or similar, which the CPU can add directly (or at least, the
compiler can fake it). Even if the variables are high-level objects, the
compiler can usually make many safe assumptions about what methods will
be called, and can compile instructions something like this pseudo-code:
10 get the int64 at location 12348 # "a"
20 get the int64 at location 13872 # "b"
30 jump to the function at location 93788 # add two int64s
40 store the result at location 59332 # "x"
which is fast and efficient because most of the hard work is done at
compile time. But it's also quite restrictive, because you can't change
code on the fly, create new types or functions, etc. (Or, where you can,
then you lose some of the advantages of C++ and end up with something
like Python but with worse syntax.)
In Python, you don't know what a and b are until runtime. They could be
ints, or lists, or strings, or anything. The + operator could call a
custom __add__ method, or a __radd__ method, from some arbitrary class.
Because nearly everything is dynamic, the Python compiler cannot safely
make many assumptions about the code at compile time. So you end up with
code like this:
10 search for the name "a" and take note of it
20 search for the name "b" and take note of it
30 decide whether to call a.__add__ or b.__radd__
40 call the appropriate method
60 bind the result to the name "x"
You can get an idea of what Python actually does by disassembling the
byte code into pseudo-assembly language:
py> code = compile("x = a + b", '', 'single')
py> from dis import dis
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
7 STORE_NAME 2 (x)
10 LOAD_CONST 0 (None)
Nevertheless, PyPy can often speed up Python code significantly,
sometimes to the speed of C or even faster.
More information about the Python-list