New subject: Optimizing builtins

1 Jan 2011


      On Sat, Jan 1, 2011 at 5:11 AM, Maciej Fijalkowski  wrote:
...
On Sat, Jan 1, 2011 at 11:32 AM, Cesare Di Mauro wrote:
...
Yes, I know it, but the special opcode which I was talking about has a very
different usage.
The primary goal was to speed-up fors, generating specialized code when the
proper range builtin is found at runtime, and it's convenient to have such
optimized code.
As you stated, the compiler don't know if range is a builtin until runtime
(at the precise moment of for execution), so it'll generate two different
code paths. The function's bytecode will look like that:
0 SETUP_LOOP 62
2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40
# Usual, slow, code starts here
40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack
46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack
52 STORE_SUBSCR
54 STORE_SUBSCR
56 STORE_SUBSCR
58 STORE_SUBSCR
60 STORE_SUBSCR
62 POP_BLOCK
64 RETURN_CONST 'None'
It's just an example; cde can be different based on the compiler
optimizations and opcodes available in the virtual machine.
The most important thing is that the semantic will be preserved (I never
intended to drop it! ;)
The thing is, having a JIT, this all is completely trivial (as well as
bunch of other stuff like avoiding allocating ints at all).
Right. That's a much saner solution than trying to generate bulky
bytecode as Cesare proposed. The advantage of a JIT is also that it
allows doing these optimizations only in those places where it
matters.

In general I am not much in favor of trying to optimize Python's
bytecode. I prefer the bytecode to be dead simple. This probably makes
it an easy target for CS majors interested in code generation, and it
probably is a great exercise trying to do something like that, but
let's please not confuse that with actual speed improvements to Python
-- those come from careful observation (& instrumentation) of real
programs, not from looking at toy bytecode samples. (Most of the
bytecode improvements that actually made a difference were done in the
first 5 years of Python's existence.)
...
Generating two different code paths has a tendency to lead to code
explosion (even exponential if you're not careful enough), which has
it's own set of problems (including cache locality, because code
executed is no longer a small continuous chunk of memory). What we
(PyPy) do, is to compile only the common path (using JIT) and then
have unlikely path fall back to the interpreter. This generally solves
all of nasty problems you can possibly encounter.
Great observation!

-- 
--Guido van Rossum (python.org/~guido)

Re: [Python-ideas] Optimizing builtins

Guido van Rossum

Terry Reedy

Guido van Rossum

Terry Reedy

Cesare Di Mauro

tags

participants (3)