Re: [Python-ideas] Optimizing builtins
On Sat, Jan 1, 2011 at 5:11 AM, Maciej Fijalkowski
On Sat, Jan 1, 2011 at 11:32 AM, Cesare Di Mauro wrote:
Yes, I know it, but the special opcode which I was talking about has a very different usage. The primary goal was to speed-up fors, generating specialized code when the proper range builtin is found at runtime, and it's convenient to have such optimized code. As you stated, the compiler don't know if range is a builtin until runtime (at the precise moment of for execution), so it'll generate two different code paths. The function's bytecode will look like that: 0 SETUP_LOOP 62 2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40 # Usual, slow, code starts here 40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack 46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack 52 STORE_SUBSCR 54 STORE_SUBSCR 56 STORE_SUBSCR 58 STORE_SUBSCR 60 STORE_SUBSCR 62 POP_BLOCK 64 RETURN_CONST 'None' It's just an example; cde can be different based on the compiler optimizations and opcodes available in the virtual machine. The most important thing is that the semantic will be preserved (I never intended to drop it! ;)
The thing is, having a JIT, this all is completely trivial (as well as bunch of other stuff like avoiding allocating ints at all).
Right. That's a much saner solution than trying to generate bulky bytecode as Cesare proposed. The advantage of a JIT is also that it allows doing these optimizations only in those places where it matters. In general I am not much in favor of trying to optimize Python's bytecode. I prefer the bytecode to be dead simple. This probably makes it an easy target for CS majors interested in code generation, and it probably is a great exercise trying to do something like that, but let's please not confuse that with actual speed improvements to Python -- those come from careful observation (& instrumentation) of real programs, not from looking at toy bytecode samples. (Most of the bytecode improvements that actually made a difference were done in the first 5 years of Python's existence.)
Generating two different code paths has a tendency to lead to code explosion (even exponential if you're not careful enough), which has it's own set of problems (including cache locality, because code executed is no longer a small continuous chunk of memory). What we (PyPy) do, is to compile only the common path (using JIT) and then have unlikely path fall back to the interpreter. This generally solves all of nasty problems you can possibly encounter.
Great observation! -- --Guido van Rossum (python.org/~guido)
participants (3)
-
Cesare Di Mauro
-
Guido van Rossum
-
Terry Reedy