[Python-ideas] Optimizing builtins

Cesare Di Mauro cesare.di.mauro at gmail.com
Sat Jan 1 10:32:30 CET 2011


2010/12/31 Guido van Rossum <guido at python.org>

> [Changed subject *and* list]
>
> > 2010/12/31 Maciej Fijalkowski <fijall at gmail.com>
> >> How do you know that range is a builtin you're thinking
> >> about and not some other object?
>
> On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro
> <cesare.di.mauro at gmail.com> wrote:
> > By a special opcode which could do this work. ]:-)
>
> That can't be the answer, because then the question would become "how
> does the compiler know it can use the special opcode". This particular
> issue (generating special opcodes for certain builtins) has actually
> been discussed many times before. Alas, given Python's extremely
> dynamic promises it is very hard to do it in a way that is
> *guaranteed* not to change the semantics. For example, I could have
> replaced builtins['range'] with something else; or I could have
> inserted a variable named 'range' into the module's __dict__. (Note
> that I am not talking about just creating a global variable named
> 'range' in the module; those the compiler could recognize. I am
> talking about interceptions that a compiler cannot see, assuming it
> compiles each module independently, i.e. without whole-program
> optimizations.)
>

Yes, I know it, but the special opcode which I was talking about has a very
different usage.

The primary goal was to speed-up fors, generating specialized code when the
proper range builtin is found at runtime, and it's convenient to have such
optimized code.

As you stated, the compiler don't know if range is a builtin until runtime
(at the precise moment of for execution), so it'll generate two different
code paths. The function's bytecode will look like that:

0 SETUP_LOOP 62
2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40
# Usual, slow, code starts here
40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack
46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack
48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack
52 STORE_SUBSCR
54 STORE_SUBSCR
56 STORE_SUBSCR
58 STORE_SUBSCR
60 STORE_SUBSCR
62 POP_BLOCK
64 RETURN_CONST 'None'

It's just an example; cde can be different based on the compiler
optimizations and opcodes available in the virtual machine.

The most important thing is that the semantic will be preserved (I never
intended to drop it! ;)


> Now, *in practice* such manipulations are rare (with the possible
> exception of people replacing open() with something providing hooks
> for e.g. a virtual filesystem) and there is probably some benefit to
> be had. (I expect that the biggest benefit might well be from
> replacing len() with an opcode.) I have in the past proposed to change
> the official semantics of the language subtly to allow such
> optimizations (i.e. recognizing builtins and replacing them with
> dedicated opcodes). There should also be a simple way to disable them,
> e.g. by setting "len = len" at the top of a module, one would be
> signalling that len() is not to be replaced by an opcode. But it
> remains messy and nobody has really gotten very far with implementing
> this. It is certainly not "low-hanging fruit" to do it properly.
>
> I should also refer people interested in this subject to at least
> three PEPs that were written about this topic: PEP 266, PEP 267 and
> PEP 280. All three have been deferred, since nobody was bold enough to
> implement at least one of them well enough to be able to tell if it
>  was even worth the trouble.


I read them, and they are interesting, but my case is quite different.


> I haven't read either of those in a long
> time, and they may well be outdated by current JIT technology. I just
> want to warn folks that it's not such a simple matter to replace "for
> i in range(....):" with a special opcode.
>

May be trying to optimize the current non-JITed Python implementation is a
death binary. JITs are evolving so much that all the things we have
discussed here are already took into account.

(FWIW, optimizing "x[i] = i" would be much simpler -- I don't really
> care about the argument that a debugger might interfere. But again,
> apart from the simplest cases, it requires a sophisticated parser to
> determine that it is really safe to do so.)
>
> --
> --Guido van Rossum (python.org/~guido)
>

It depends strictly by the goals we want to reach. A more advanced parser
with a simple type-inference system can be implemented without requiring a
complete parser rebuild, and can give good (albeit not optimal) results.
At least lists, dictionaries, tuples, and sets operations, which are very
common on Python, can benefit; something for ints, doubles and complexes can
be done, also.

But looking at the JITs it can be just lost time...

Cesare
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110101/1108c869/attachment.html>


More information about the Python-ideas mailing list