[Python-ideas] Optimizing builtins

Guido van Rossum guido at python.org
Sat Jan 1 17:41:35 CET 2011


On Sat, Jan 1, 2011 at 5:11 AM, Maciej Fijalkowski <fijall at gmail.com> wrote:
> On Sat, Jan 1, 2011 at 11:32 AM, Cesare Di Mauro wrote:
>> Yes, I know it, but the special opcode which I was talking about has a very
>> different usage.
>> The primary goal was to speed-up fors, generating specialized code when the
>> proper range builtin is found at runtime, and it's convenient to have such
>> optimized code.
>> As you stated, the compiler don't know if range is a builtin until runtime
>> (at the precise moment of for execution), so it'll generate two different
>> code paths. The function's bytecode will look like that:
>> 0 SETUP_LOOP 62
>> 2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40
>> # Usual, slow, code starts here
>> 40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
>> 44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack
>> 46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
>> 48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack
>> 52 STORE_SUBSCR
>> 54 STORE_SUBSCR
>> 56 STORE_SUBSCR
>> 58 STORE_SUBSCR
>> 60 STORE_SUBSCR
>> 62 POP_BLOCK
>> 64 RETURN_CONST 'None'
>> It's just an example; cde can be different based on the compiler
>> optimizations and opcodes available in the virtual machine.
>> The most important thing is that the semantic will be preserved (I never
>> intended to drop it! ;)
>
> The thing is, having a JIT, this all is completely trivial (as well as
> bunch of other stuff like avoiding allocating ints at all).

Right. That's a much saner solution than trying to generate bulky
bytecode as Cesare proposed. The advantage of a JIT is also that it
allows doing these optimizations only in those places where it
matters.

In general I am not much in favor of trying to optimize Python's
bytecode. I prefer the bytecode to be dead simple. This probably makes
it an easy target for CS majors interested in code generation, and it
probably is a great exercise trying to do something like that, but
let's please not confuse that with actual speed improvements to Python
-- those come from careful observation (& instrumentation) of real
programs, not from looking at toy bytecode samples. (Most of the
bytecode improvements that actually made a difference were done in the
first 5 years of Python's existence.)

> Generating two different code paths has a tendency to lead to code
> explosion (even exponential if you're not careful enough), which has
> it's own set of problems (including cache locality, because code
> executed is no longer a small continuous chunk of memory). What we
> (PyPy) do, is to compile only the common path (using JIT) and then
> have unlikely path fall back to the interpreter. This generally solves
> all of nasty problems you can possibly encounter.

Great observation!

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list