2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>
Consider the following code:

def foobar(x):
   for i in range(5):
       x[i] = i

The bytecode in python 2.7 is the following:

 2           0 SETUP_LOOP              30 (to 33)
             3 LOAD_GLOBAL              0 (range)
             6 LOAD_CONST               1 (5)
             9 CALL_FUNCTION            1
            12 GET_ITER
       >>   13 FOR_ITER                16 (to 32)
            16 STORE_FAST               1 (i)

 3          19 LOAD_FAST                1 (i)
            22 LOAD_FAST                0 (x)
            25 LOAD_FAST                1 (i)
            28 STORE_SUBSCR
            29 JUMP_ABSOLUTE           13
       >>   32 POP_BLOCK
       >>   33 LOAD_CONST               0 (None)
            36 RETURN_VALUE

Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load
and put the reference twice on the stack? There is no way that the
reference of i might change in between the two lines. Also, the
load_fast in lne 22 to reference x could be taken out of the loop as x
 will always point to the same object....

Yes, you can, but you need:
- a better AST evaluator (to mark symbols/variables with proper attributes);
- a better optimizer (usually located on compile.c) which has a "global vision" (not limited to single instructions and/or single expressions).

It's not that simple, and the results aren't guaranteed to be good.

Also, consider that Python, as a dynamic-and-not-statically-compiled language need to find a good trade-off between compilation time and execution.

Just to be clear, a C program is usually compiled once, then executed, so you can spend even *hours* to better optimize the final binary code.

With a dynamic language, usually the code is compiled and the executed as needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem).

Python stays in a "gray area", because modules are usually compiled once (when they are first used), and executed many times, but it isn't the only case.

You cannot assume that optimization techniques used on other (static) languages can be used/ported in Python.