Possible optimization for LOAD_FAST ?

Consider the following code: def foobar(x): for i in range(5): x[i] = i The bytecode in python 2.7 is the following: 2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i) 3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack? There is no way that the reference of i might change in between the two lines. Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

2010/12/29 "Martin v. Löwis" <martin@v.loewis.de>
OK, but is it mandatory? For example, in the above code, I can unroll the loop because I found that range is the usual built-in, 5 is a low-enough constant, and the body is made by a simple statement. Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) cannot even find something to change about it. And so on with other optimization examples that can be possible. Are they "legal" with Python? I think that we need to make it clear what happens in such cases. My idea is that it should be made implementation-specific. What happens with local variables and the generated code must depend on the specific compiler & virtual machine, in order to have a greater flexibility. IMHO the most important thing should be that, under normal conditions, the executed code have the expected behavior. Cesare

>> Another example. I can totally remove the variable i, just using the >> stack, so a debugger (or, in general, having the tracing enabled) >> cannot even find something to change about it. Ethan> -1 Ethan> Debugging is challenging enough as it is -- why would you want to Ethan> make it even more difficult? <snarky> I don't know. Maybe he wants his program to run faster. </snarky> If you use print statements for the bulk of your debugging (many people do), unrolling loops doesn't affect your debugging ability. Skip

On 12/31/2010 12:51 PM, Cesare Di Mauro wrote: centered around, "but that line isn't executed, because it's been optimized away." It's common in sophisticated compilers (as in, any C compiler) to be able to choose whether you want optimizations for speed, or disabling optimizations for debugging and reasoning about the code. Python would benefit from the same choice. --Ned.

2011/1/1 Ned Batchelder <ned@nedbatchelder.com>
Command line parameters and/or environment variables are suitable for this, but they aren't immediate and, also, have global effect. I wish an explicit ("Explicit is better than implicit") and a finer control over optimizations, with a per-module usage: from __compiler__ import disable_peepholer, strict_syntax, static_builtins, globals_as_fasts Cesare

Le mercredi 29 décembre 2010 à 14:20 +0100, "Martin v. Löwis" a écrit :
That's why Python has the following option: -O : optimize generated bytecode slightly; also PYTHONOPTIMIZE=x I regulary recompile programs with gcc -O0 -g to debug them. It is very difficult to debug (with gdb) a program compiled with gcc -O2: many variables are stored in registers, and gdb doesn't support that correctly. Victor

On 1/2/2011 8:17 AM, Victor Stinner wrote:
Victor, you seem to be equating the gcc -O flag with the Python -O flag. They are described similarly, but can't be used the same way. In particular, there is no Python equivalent to gcc's -O0: there is no way to disable the Python peephole optimizer. --Ned.

2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>
Yes, you can, but you need: - a better AST evaluator (to mark symbols/variables with proper attributes); - a better optimizer (usually located on compile.c) which has a "global vision" (not limited to single instructions and/or single expressions). It's not that simple, and the results aren't guaranteed to be good. Also, consider that Python, as a dynamic-and-not-statically-compiled language need to find a good trade-off between compilation time and execution. Just to be clear, a C program is usually compiled once, then executed, so you can spend even *hours* to better optimize the final binary code. With a dynamic language, usually the code is compiled and the executed as needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem). Python stays in a "gray area", because modules are usually compiled once (when they are first used), and executed many times, but it isn't the only case. You cannot assume that optimization techniques used on other (static) languages can be used/ported in Python. Cesare

Cesare Di Mauro <cesare.di.mauro <at> gmail.com> writes:
always point to the same object....
vision" (not limited to single instructions and/or single expressions).
It's not that simple, and the results aren't guaranteed to be good.
Also, consider that Python, as a dynamic-and-not-statically-compiled language
need to find a good trade-off between compilation time and execution.
Just to be clear, a C program is usually compiled once, then executed, so you
can spend even *hours* to better optimize the final binary code.
With a dynamic language, usually the code is compiled and the executed as
needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem).
Python stays in a "gray area", because modules are usually compiled once (when
they are first used), and executed many times, but it isn't the only case.
You cannot assume that optimization techniques used on other (static)
languages can be used/ported in Python.
Cesare
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong. Alex

On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.
True. My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization. But barring someone interested in implementing something based on this rule, the proposal has languished for many years. FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions). -- --Guido van Rossum (python.org/~guido)

On 1/2/2011 10:18 PM, Guido van Rossum wrote:
I believe this amounts to saying 1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today. 2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals. It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name. For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass. So it could not work interactively. Should batch mode main modules work the same as when imported? Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overriden, also get resolved when the function is compiled. -- Terry Jan Reedy

On Mon, Jan 3, 2011 at 3:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
This could potentially be handled by having the "exec" mode in compile() assume it can see all the global assignments (and hence assume builtin names refer to the builtins), while "single" would assume this was not the case (and hence skip the optimisation). It may also need an additional parameter to tell the compiler which names are known to be visible in the current locals and globals (e.g. to allow exec() to do the right thing) This kind of issue is the reason Guido pointed out the idea really needs someone else to pick it up and run with it - not just to figure out the implementation details, but also to ferret out the full implications for the language semantics and backwards compatibility. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jan 2, 2011 at 9:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
In fact it is the specification today.
Actually I would leave the lookup mechanism for names that don't get special treatment the same -- the only difference would be for builtins in contexts where the compiler can generate better code (typically involving a new opcode) based on all the conditions being met.
For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass.
We actually already do such a pass.
So it could not work interactively.
That's fine. We could also disable it automatically in when eval() or exec() is the source of the code.
Should batch mode main modules work the same as when imported?
Yes.
Interactive mode would just work as it does today. I would also make a rule saying that 'open' is not treated this way. It is the only one where I can think of legitimate reasons for changing the semantics dynamically in a way that is not detectable by the compiler, assuming it only sees the source code for one module at a time. Some things that could be optimized this way: len(x), isinstance(x, (int, float)), range(10), issubclass(x, str), bool(x), int(x), hash(x), etc... in general, the less the function does the better a target for this optimization it is. One more thing: to avoid heisenbugs, I propose that, for any particular builtin, if this optimization is used anywhere in a module, it is should be used everywhere in that module (except in scopes where the name has a different meaning). This means that we can tell users about it and they can observe it without too much of a worry that a slight change to their program might disable it. (I've seen this with optimizations in gcc, and it makes performance work tricky.) Still, it's all academic until someone implements some of the optimizations. (There rest of the work is all in the docs and in the users' minds.) -- --Guido van Rossum (python.org/~guido)

On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:
Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful. If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.

On Mon, Jan 3, 2011 at 6:12 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful.
I am explicitly excluding open from this optimization, for that very reason.
If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.
In general I am against duplicating bytecode -- it can blow up too much. (It is an entirely appropriate technique for JIT compilers -- but my point here is that bytecode is different.) Recompiling a module is not a trivial change -- for example, either code objects would have to become mutable, or we'd have to track down all the code objects and replace them. Neither sounds attractive to me. -- --Guido van Rossum (python.org/~guido)

On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:
Is there a PEP for this?
I've been attempting another way in: http://bugs.python.org/issue10399 using a new "JUMP_IF_SPECIALIZABLE" opcode. This compares what a value is against a compile-time prediction, branching to an optimized implementation if the guess was correct. I use this to implement function-call inlining within the generated bytecode. Caveat-of-doom: That code's very much a work-in-progress at this stage, though: sometimes it doesn't segfault :) and the way that I track the predicted values is taking some other liberties with semantics (see that URL and the dmalcolm-ast-optimization-branch in SVN). (There's probably at least 2 PEPs in the above idea, though have yet to write my first PEP) Hope this is helpful Dave

On Mon, Jan 3, 2011 at 9:52 AM, David Malcolm <dmalcolm@redhat.com> wrote:
Not that I know of, otherwise I'd have mentioned it. :-) It would be useful if someone wrote it up, since the idea comes back in one form or another regularly.
Yeah, that's what everybody proposes to keep the language semantics unchanged. But I claim that an easier solution is to say to hell with those semantics, let's change them to make the implementation simpler. That's from the Zen of Python: "If the implementation is easy to explain, it may be a good idea." I guess few people can seriously propose to change Python's semantics, that's why *I* am proposing it. :-) Note that the semantics of locals (e.g. UnboundLocalError) were also changed specifically to allow a significant optimization -- again by me.
If you want to write up a PEP for the semantics change I am proposing, everything you need is in this thread. -- --Guido van Rossum (python.org/~guido)

On 04/01/2011 01:02, Guido van Rossum wrote:
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this? -- --Guido van Rossum (python.org/~guido)

On Wed, Jan 5, 2011 at 1:58 AM, Guido van Rossum <guido@python.org> wrote:
I've seen it done to write tests for simple CLI behaviour by mocking input() and print() (replacing sys.stdin and sys.stdout instead is far more common, but replacing the functions works too). If compile() accepted a blacklist of builtins that it wasn't allowed to optimise, then that should deal with the core of the problem as far as testing goes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 4, 2011 at 10:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ugh, I can't be the only one who finds these special cases to be a little nasty? Special cases aren't special enough to break the rules. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me

On Tue, Jan 4, 2011 at 8:21 AM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
+1, I don't think nailing down a few builtins is that helpful for optimizing Python. Anyone attempting to seriously optimize Python is going to need to use more general techniques that apply to non-builtins as well. In unladen swallow (I'm sure something similar is done in PyPy) we have some infrastructure for watching dictionaries for changes, and in particular we tend to watch the builtins and module dictionaries. Reid

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:
Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit. Outside of testing, I do agree that most of the builtins could be pretty safely optimized (even open()). There needs to be a way to stop all optimizations for testing purposes. Perhaps a sys variable, plus command line option and/or environment variable? -Barry

On 04/01/2011 16:54, Barry Warsaw wrote:
Although testing in an environment deliberately different from production is a recipe for hard to diagnose bugs. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Doesnt this all boil down to being able to monitor PyDict for changes to it's key-space? The keys are immutable anyway so the instances of PyDict could manage a opaque value (in fact, a counter) that changes every time a new value is written to any key. Once we get a reference out of the dict, we can can do very fast lookups by passing the key, the reference we know from the last lookup and our last state. The lookup returns a new reference and the new state. If the dict has not changed, the state doesnt change and the reference is simply taken from the passed value passed to the lookup. That way the code remains the same no matter if the dict has changed or not. 2011/1/4 Michael Foord <fuzzyman@voidspace.org.uk>:

Barry Warsaw <barry@python.org> wrote:
Yuck from me as well. I would guess that attribute lookups would be just as significant as global variable lookups (depending on coding style, of course). In contrast, the local variable semantic change provided a big speed increase for a minor language complexity cost. Neil

Guido van Rossum wrote:
I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved. I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one. -- Steven

On Tue, Jan 4, 2011 at 1:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:
But why couldn't you edit the source code?
As long as you have a definition or import of sum at the top of (or really anywhere in) the module, that will still work. It's only if you were to do things like import builtins builtins.len = ... (whether inside your package or elsewhere) that things would stop working with the proposed optimization. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len. Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise.
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely? Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking. The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing: mymodule.len = mylen But I haven't done much work with mocks, so I'm just guessing. -- Steven

On Tue, Jan 4, 2011 at 3:13 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The interactive interpreter will always be excluded from this kind of optimization (well, in my proposal anyway).
There are two versions of the "assign to global named 'len'" idiom. One is benign: if the assignment occurs in the source code of the module (i.e., where the compiler can see it when it is compiling the module) the optimization will be disabled in that module. The second is not: if a module-global named 'len' is set in a module from outside that module, the compiler cannot see that assignment when it considers the optimization, and it may generate optimized code that will not take the global by that name into account (it would use an opcode that computes the length of an object directly). The third way to mess with the optimization is messing with builtins.len. This one is also outside what the compiler can see. [...]
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely?
Monkey-patching intentionally has a bad name -- there's always a code smell. (And it looks like one, too. :-)
Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking.
I think the folks referring to monkey-patching builtins in unittests were referring to this. But they might also be referring to the second option above.
Same here. But it would fail (i.e. not be picked up by the optimization) either way. -- --Guido van Rossum (python.org/~guido)

On 04/01/2011 23:29, Guido van Rossum wrote:
I prefer monkey patching builtins (where I do such a thing) in the namespace where they are used. I know it is *common* to monkeypatch __builtins__.open (python 2) however. I don't recall monkey patching anything other than open and raw_input myself but I *bet* there are people doing it for reasons they see as legitimate in tests. :-) Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On 04/01/2011 23:29, Guido van Rossum wrote:
The only examples I could find from a quick search (using the patch decorator from my mock module which is reasonably well used) patch __builtins__.open and raw_input. https://www.google.com/codesearch?hl=en&lr=&q=%22patch%28%27__builtin__.%22+lang%3Apython+case%3Ayes :-) Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On Tue, Jan 4, 2011 at 3:36 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
So, that significantly weakens the argument that this optimization will break unit tests, since I am happy to promise never to optimize these builtins, and any other builtins intended for I/O. Surely it will break *somebody's* code. That hasn't stopped us with other changes. The crux is whether it breaks significant amounts of code or code that would be really hard to write in another way. -- --Guido van Rossum (python.org/~guido)

On 1/4/2011 6:39 PM, Guido van Rossum wrote:
This is one comprehensible rule rather than a list of exceptions, so easier to remember. It has two rationales: such often need to be over-riden for testing, possibly in hidden ways; such are inherently 'slow' so optimizing dict lookup away hardly makes sense. -- Terry Jan Reedy

How about not changing semantics and still making this optimization possible? PyPy already has CALL_LIKELY_BUILTIN which checks whether builtins has been altered (by keeping a flag on the module dictionary) and if not, loads a specific builtin on top of value stack. From my current experience, I would make a bet that someone is altering pretty much every builtin for some dark reasons. One that comes to mind is to test something using external library which is not playing along too well. That however only lets one avoid dictionary lookups, it doesn't give potential for other optimizations (which in my opinion are limited until we hit something dynamic like an instance, but let's ignore it). How about creating two copies of bytecode (that's not arbitrary number, just 2) and a way to go from more optimized to less optimized in case *any* of promises is invalidated? That gives an ability to save semantics, while allowing optimizations. That said, I think CPython should stay as a simple VM and refrain from doing things that are much easier in the presence of a JIT (and give a lot more speedups), but who am I to judge. Cheers, fijal

On Wed, Jan 5, 2011 at 5:27 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I can only repeat what I said before. That's what everybody proposes, and if you have the infrastructure, it's a fine solution. But to me, those semantics aren't sacred, and I want to at least explore an alternative. Putting a hook on two dicts (the module globals and builtins.__dict__) is a lot of work in CPython, and has the risk of slowing everything down (just a tad, but still -- AFAIK dicts currently are not hookable). Checking whether there's a global named 'len' is much simpler in the current CPython compiler. -- --Guido van Rossum (python.org/~guido)

Steven> I've been known to monkey-patch builtins in the interactive Steven> interpreter and in test code. Me too. I use a slightly beefed up dir() funcion which identifies modules within a package which haven't been imported yet. Handy for quick-n-dirty introspection. >>> import email >>> dir(email) ['Charset', 'Encoders', 'Errors', 'FeedParser', 'Generator', 'Header', 'Iterators', 'LazyImporter', 'MIMEAudio', 'MIMEBase', 'MIMEImage', 'MIMEMessage', 'MIMEMultipart', 'MIMENonMultipart', 'MIMEText', 'Message', 'Parser', 'Utils', '[_parseaddr]', '[base64mime]', '[charset]', '[encoders]', '[errors]', '[feedparser]', '[generator]', '[header]', '[iterators]', '[message]', '[parser]', '[quoprimime]', '[test/]', '[utils]', '_LOWERNAMES', '_MIMENAMES', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', '_name', 'base64MIME', 'email', 'importer', 'message_from_file', 'message_from_string', 'mime', 'quopriMIME', 'sys'] Those names with [...] bracketing the names are submodules or subpackages of the email package which haven't been imported yet. Skip

On Tue, Jan 4, 2011 at 8:49 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
print() and input() come to mind. However, so long as appropriate tools are provided to recompile a module with this optimisation disabled for certain builtins (with some builtins such as open(), print() and input() blacklisted by default) then that issue should be manageable. I've extracted the full list of 68 builtin functions from the table in the docs below. I've placed asterisks next to the ones I think we would *want* to be eligible for optimisation. Aside from the 3 mentioned above, we could fairly easily omit ones which are used primarily at the interactive prompt (i.e. dir(), help()), as well as __import__() (since that lookup is handled specially anyway). Cheers, Nick. __import__() abs() * all() * any() * ascii() * bin() * bool() * bytearray() * bytes() * callable() * chr() * classmethod() * compile() * complex() * delattr() * dict() * dir() divmod() * enumerate() * eval() * exec() * filter() * float() * format() * frozenset() * getattr() * globals() * hasattr() * hash() * help() hex() * id() * input() int() * isinstance() * issubclass() * iter() * len() * list() * locals() * map() * max() * memoryview() * min() * next() * object() * oct() * open() ord() * pow() * print() property() * range() * repr() * reversed() * round() * set() * setattr() * slice() * sorted() * staticmethod() * str() * sum() * super() * tuple() * type() * vars() * zip() * -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2011/1/3 Alex Gaynor <alex.gaynor@gmail.com>
That's your opinion, but I have very different ideas. Of course we can't leave the problem only on the compiler shoulders, but I think that can be ways to threat builtins as "static" variables, and globals like local (fast) variables too, taking into account changes on the builtins' and modules dictionaries. But it doesn't make sense to invest time in these things: JITs are becoming a good alternative, and may be they will be ready soon to take the CPython place as the "mainstream" implementation. Cesare

2010/12/29 "Martin v. Löwis" <martin@v.loewis.de>
OK, but is it mandatory? For example, in the above code, I can unroll the loop because I found that range is the usual built-in, 5 is a low-enough constant, and the body is made by a simple statement. Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) cannot even find something to change about it. And so on with other optimization examples that can be possible. Are they "legal" with Python? I think that we need to make it clear what happens in such cases. My idea is that it should be made implementation-specific. What happens with local variables and the generated code must depend on the specific compiler & virtual machine, in order to have a greater flexibility. IMHO the most important thing should be that, under normal conditions, the executed code have the expected behavior. Cesare

>> Another example. I can totally remove the variable i, just using the >> stack, so a debugger (or, in general, having the tracing enabled) >> cannot even find something to change about it. Ethan> -1 Ethan> Debugging is challenging enough as it is -- why would you want to Ethan> make it even more difficult? <snarky> I don't know. Maybe he wants his program to run faster. </snarky> If you use print statements for the bulk of your debugging (many people do), unrolling loops doesn't affect your debugging ability. Skip

On 12/31/2010 12:51 PM, Cesare Di Mauro wrote: centered around, "but that line isn't executed, because it's been optimized away." It's common in sophisticated compilers (as in, any C compiler) to be able to choose whether you want optimizations for speed, or disabling optimizations for debugging and reasoning about the code. Python would benefit from the same choice. --Ned.

2011/1/1 Ned Batchelder <ned@nedbatchelder.com>
Command line parameters and/or environment variables are suitable for this, but they aren't immediate and, also, have global effect. I wish an explicit ("Explicit is better than implicit") and a finer control over optimizations, with a per-module usage: from __compiler__ import disable_peepholer, strict_syntax, static_builtins, globals_as_fasts Cesare

Le mercredi 29 décembre 2010 à 14:20 +0100, "Martin v. Löwis" a écrit :
That's why Python has the following option: -O : optimize generated bytecode slightly; also PYTHONOPTIMIZE=x I regulary recompile programs with gcc -O0 -g to debug them. It is very difficult to debug (with gdb) a program compiled with gcc -O2: many variables are stored in registers, and gdb doesn't support that correctly. Victor

On 1/2/2011 8:17 AM, Victor Stinner wrote:
Victor, you seem to be equating the gcc -O flag with the Python -O flag. They are described similarly, but can't be used the same way. In particular, there is no Python equivalent to gcc's -O0: there is no way to disable the Python peephole optimizer. --Ned.

2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>
Yes, you can, but you need: - a better AST evaluator (to mark symbols/variables with proper attributes); - a better optimizer (usually located on compile.c) which has a "global vision" (not limited to single instructions and/or single expressions). It's not that simple, and the results aren't guaranteed to be good. Also, consider that Python, as a dynamic-and-not-statically-compiled language need to find a good trade-off between compilation time and execution. Just to be clear, a C program is usually compiled once, then executed, so you can spend even *hours* to better optimize the final binary code. With a dynamic language, usually the code is compiled and the executed as needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem). Python stays in a "gray area", because modules are usually compiled once (when they are first used), and executed many times, but it isn't the only case. You cannot assume that optimization techniques used on other (static) languages can be used/ported in Python. Cesare

Cesare Di Mauro <cesare.di.mauro <at> gmail.com> writes:
always point to the same object....
vision" (not limited to single instructions and/or single expressions).
It's not that simple, and the results aren't guaranteed to be good.
Also, consider that Python, as a dynamic-and-not-statically-compiled language
need to find a good trade-off between compilation time and execution.
Just to be clear, a C program is usually compiled once, then executed, so you
can spend even *hours* to better optimize the final binary code.
With a dynamic language, usually the code is compiled and the executed as
needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem).
Python stays in a "gray area", because modules are usually compiled once (when
they are first used), and executed many times, but it isn't the only case.
You cannot assume that optimization techniques used on other (static)
languages can be used/ported in Python.
Cesare
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong. Alex

On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.
True. My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization. But barring someone interested in implementing something based on this rule, the proposal has languished for many years. FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions). -- --Guido van Rossum (python.org/~guido)

On 1/2/2011 10:18 PM, Guido van Rossum wrote:
I believe this amounts to saying 1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today. 2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals. It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name. For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass. So it could not work interactively. Should batch mode main modules work the same as when imported? Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overriden, also get resolved when the function is compiled. -- Terry Jan Reedy

On Mon, Jan 3, 2011 at 3:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
This could potentially be handled by having the "exec" mode in compile() assume it can see all the global assignments (and hence assume builtin names refer to the builtins), while "single" would assume this was not the case (and hence skip the optimisation). It may also need an additional parameter to tell the compiler which names are known to be visible in the current locals and globals (e.g. to allow exec() to do the right thing) This kind of issue is the reason Guido pointed out the idea really needs someone else to pick it up and run with it - not just to figure out the implementation details, but also to ferret out the full implications for the language semantics and backwards compatibility. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Jan 2, 2011 at 9:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:
In fact it is the specification today.
Actually I would leave the lookup mechanism for names that don't get special treatment the same -- the only difference would be for builtins in contexts where the compiler can generate better code (typically involving a new opcode) based on all the conditions being met.
For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass.
We actually already do such a pass.
So it could not work interactively.
That's fine. We could also disable it automatically in when eval() or exec() is the source of the code.
Should batch mode main modules work the same as when imported?
Yes.
Interactive mode would just work as it does today. I would also make a rule saying that 'open' is not treated this way. It is the only one where I can think of legitimate reasons for changing the semantics dynamically in a way that is not detectable by the compiler, assuming it only sees the source code for one module at a time. Some things that could be optimized this way: len(x), isinstance(x, (int, float)), range(10), issubclass(x, str), bool(x), int(x), hash(x), etc... in general, the less the function does the better a target for this optimization it is. One more thing: to avoid heisenbugs, I propose that, for any particular builtin, if this optimization is used anywhere in a module, it is should be used everywhere in that module (except in scopes where the name has a different meaning). This means that we can tell users about it and they can observe it without too much of a worry that a slight change to their program might disable it. (I've seen this with optimizations in gcc, and it makes performance work tricky.) Still, it's all academic until someone implements some of the optimizations. (There rest of the work is all in the docs and in the users' minds.) -- --Guido van Rossum (python.org/~guido)

On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:
Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful. If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.

On Mon, Jan 3, 2011 at 6:12 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful.
I am explicitly excluding open from this optimization, for that very reason.
If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.
In general I am against duplicating bytecode -- it can blow up too much. (It is an entirely appropriate technique for JIT compilers -- but my point here is that bytecode is different.) Recompiling a module is not a trivial change -- for example, either code objects would have to become mutable, or we'd have to track down all the code objects and replace them. Neither sounds attractive to me. -- --Guido van Rossum (python.org/~guido)

On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:
Is there a PEP for this?
I've been attempting another way in: http://bugs.python.org/issue10399 using a new "JUMP_IF_SPECIALIZABLE" opcode. This compares what a value is against a compile-time prediction, branching to an optimized implementation if the guess was correct. I use this to implement function-call inlining within the generated bytecode. Caveat-of-doom: That code's very much a work-in-progress at this stage, though: sometimes it doesn't segfault :) and the way that I track the predicted values is taking some other liberties with semantics (see that URL and the dmalcolm-ast-optimization-branch in SVN). (There's probably at least 2 PEPs in the above idea, though have yet to write my first PEP) Hope this is helpful Dave

On Mon, Jan 3, 2011 at 9:52 AM, David Malcolm <dmalcolm@redhat.com> wrote:
Not that I know of, otherwise I'd have mentioned it. :-) It would be useful if someone wrote it up, since the idea comes back in one form or another regularly.
Yeah, that's what everybody proposes to keep the language semantics unchanged. But I claim that an easier solution is to say to hell with those semantics, let's change them to make the implementation simpler. That's from the Zen of Python: "If the implementation is easy to explain, it may be a good idea." I guess few people can seriously propose to change Python's semantics, that's why *I* am proposing it. :-) Note that the semantics of locals (e.g. UnboundLocalError) were also changed specifically to allow a significant optimization -- again by me.
If you want to write up a PEP for the semantics change I am proposing, everything you need is in this thread. -- --Guido van Rossum (python.org/~guido)

On 04/01/2011 01:02, Guido van Rossum wrote:
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this? -- --Guido van Rossum (python.org/~guido)

On Wed, Jan 5, 2011 at 1:58 AM, Guido van Rossum <guido@python.org> wrote:
I've seen it done to write tests for simple CLI behaviour by mocking input() and print() (replacing sys.stdin and sys.stdout instead is far more common, but replacing the functions works too). If compile() accepted a blacklist of builtins that it wasn't allowed to optimise, then that should deal with the core of the problem as far as testing goes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Tue, Jan 4, 2011 at 10:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Ugh, I can't be the only one who finds these special cases to be a little nasty? Special cases aren't special enough to break the rules. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me

On Tue, Jan 4, 2011 at 8:21 AM, Alex Gaynor <alex.gaynor@gmail.com> wrote:
+1, I don't think nailing down a few builtins is that helpful for optimizing Python. Anyone attempting to seriously optimize Python is going to need to use more general techniques that apply to non-builtins as well. In unladen swallow (I'm sure something similar is done in PyPy) we have some infrastructure for watching dictionaries for changes, and in particular we tend to watch the builtins and module dictionaries. Reid

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:
Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit. Outside of testing, I do agree that most of the builtins could be pretty safely optimized (even open()). There needs to be a way to stop all optimizations for testing purposes. Perhaps a sys variable, plus command line option and/or environment variable? -Barry

On 04/01/2011 16:54, Barry Warsaw wrote:
Although testing in an environment deliberately different from production is a recipe for hard to diagnose bugs. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Doesnt this all boil down to being able to monitor PyDict for changes to it's key-space? The keys are immutable anyway so the instances of PyDict could manage a opaque value (in fact, a counter) that changes every time a new value is written to any key. Once we get a reference out of the dict, we can can do very fast lookups by passing the key, the reference we know from the last lookup and our last state. The lookup returns a new reference and the new state. If the dict has not changed, the state doesnt change and the reference is simply taken from the passed value passed to the lookup. That way the code remains the same no matter if the dict has changed or not. 2011/1/4 Michael Foord <fuzzyman@voidspace.org.uk>:

Barry Warsaw <barry@python.org> wrote:
Yuck from me as well. I would guess that attribute lookups would be just as significant as global variable lookups (depending on coding style, of course). In contrast, the local variable semantic change provided a big speed increase for a minor language complexity cost. Neil

Guido van Rossum wrote:
I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved. I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one. -- Steven

On Tue, Jan 4, 2011 at 1:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:
But why couldn't you edit the source code?
As long as you have a definition or import of sum at the top of (or really anywhere in) the module, that will still work. It's only if you were to do things like import builtins builtins.len = ... (whether inside your package or elsewhere) that things would stop working with the proposed optimization. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len. Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise.
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely? Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking. The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing: mymodule.len = mylen But I haven't done much work with mocks, so I'm just guessing. -- Steven

On Tue, Jan 4, 2011 at 3:13 PM, Steven D'Aprano <steve@pearwood.info> wrote:
The interactive interpreter will always be excluded from this kind of optimization (well, in my proposal anyway).
There are two versions of the "assign to global named 'len'" idiom. One is benign: if the assignment occurs in the source code of the module (i.e., where the compiler can see it when it is compiling the module) the optimization will be disabled in that module. The second is not: if a module-global named 'len' is set in a module from outside that module, the compiler cannot see that assignment when it considers the optimization, and it may generate optimized code that will not take the global by that name into account (it would use an opcode that computes the length of an object directly). The third way to mess with the optimization is messing with builtins.len. This one is also outside what the compiler can see. [...]
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely?
Monkey-patching intentionally has a bad name -- there's always a code smell. (And it looks like one, too. :-)
Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking.
I think the folks referring to monkey-patching builtins in unittests were referring to this. But they might also be referring to the second option above.
Same here. But it would fail (i.e. not be picked up by the optimization) either way. -- --Guido van Rossum (python.org/~guido)

On 04/01/2011 23:29, Guido van Rossum wrote:
I prefer monkey patching builtins (where I do such a thing) in the namespace where they are used. I know it is *common* to monkeypatch __builtins__.open (python 2) however. I don't recall monkey patching anything other than open and raw_input myself but I *bet* there are people doing it for reasons they see as legitimate in tests. :-) Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On 04/01/2011 23:29, Guido van Rossum wrote:
The only examples I could find from a quick search (using the patch decorator from my mock module which is reasonably well used) patch __builtins__.open and raw_input. https://www.google.com/codesearch?hl=en&lr=&q=%22patch%28%27__builtin__.%22+lang%3Apython+case%3Ayes :-) Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

On Tue, Jan 4, 2011 at 3:36 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
So, that significantly weakens the argument that this optimization will break unit tests, since I am happy to promise never to optimize these builtins, and any other builtins intended for I/O. Surely it will break *somebody's* code. That hasn't stopped us with other changes. The crux is whether it breaks significant amounts of code or code that would be really hard to write in another way. -- --Guido van Rossum (python.org/~guido)

On 1/4/2011 6:39 PM, Guido van Rossum wrote:
This is one comprehensible rule rather than a list of exceptions, so easier to remember. It has two rationales: such often need to be over-riden for testing, possibly in hidden ways; such are inherently 'slow' so optimizing dict lookup away hardly makes sense. -- Terry Jan Reedy

How about not changing semantics and still making this optimization possible? PyPy already has CALL_LIKELY_BUILTIN which checks whether builtins has been altered (by keeping a flag on the module dictionary) and if not, loads a specific builtin on top of value stack. From my current experience, I would make a bet that someone is altering pretty much every builtin for some dark reasons. One that comes to mind is to test something using external library which is not playing along too well. That however only lets one avoid dictionary lookups, it doesn't give potential for other optimizations (which in my opinion are limited until we hit something dynamic like an instance, but let's ignore it). How about creating two copies of bytecode (that's not arbitrary number, just 2) and a way to go from more optimized to less optimized in case *any* of promises is invalidated? That gives an ability to save semantics, while allowing optimizations. That said, I think CPython should stay as a simple VM and refrain from doing things that are much easier in the presence of a JIT (and give a lot more speedups), but who am I to judge. Cheers, fijal

On Wed, Jan 5, 2011 at 5:27 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:
I can only repeat what I said before. That's what everybody proposes, and if you have the infrastructure, it's a fine solution. But to me, those semantics aren't sacred, and I want to at least explore an alternative. Putting a hook on two dicts (the module globals and builtins.__dict__) is a lot of work in CPython, and has the risk of slowing everything down (just a tad, but still -- AFAIK dicts currently are not hookable). Checking whether there's a global named 'len' is much simpler in the current CPython compiler. -- --Guido van Rossum (python.org/~guido)

Steven> I've been known to monkey-patch builtins in the interactive Steven> interpreter and in test code. Me too. I use a slightly beefed up dir() funcion which identifies modules within a package which haven't been imported yet. Handy for quick-n-dirty introspection. >>> import email >>> dir(email) ['Charset', 'Encoders', 'Errors', 'FeedParser', 'Generator', 'Header', 'Iterators', 'LazyImporter', 'MIMEAudio', 'MIMEBase', 'MIMEImage', 'MIMEMessage', 'MIMEMultipart', 'MIMENonMultipart', 'MIMEText', 'Message', 'Parser', 'Utils', '[_parseaddr]', '[base64mime]', '[charset]', '[encoders]', '[errors]', '[feedparser]', '[generator]', '[header]', '[iterators]', '[message]', '[parser]', '[quoprimime]', '[test/]', '[utils]', '_LOWERNAMES', '_MIMENAMES', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', '_name', 'base64MIME', 'email', 'importer', 'message_from_file', 'message_from_string', 'mime', 'quopriMIME', 'sys'] Those names with [...] bracketing the names are submodules or subpackages of the email package which haven't been imported yet. Skip

On Tue, Jan 4, 2011 at 8:49 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
print() and input() come to mind. However, so long as appropriate tools are provided to recompile a module with this optimisation disabled for certain builtins (with some builtins such as open(), print() and input() blacklisted by default) then that issue should be manageable. I've extracted the full list of 68 builtin functions from the table in the docs below. I've placed asterisks next to the ones I think we would *want* to be eligible for optimisation. Aside from the 3 mentioned above, we could fairly easily omit ones which are used primarily at the interactive prompt (i.e. dir(), help()), as well as __import__() (since that lookup is handled specially anyway). Cheers, Nick. __import__() abs() * all() * any() * ascii() * bin() * bool() * bytearray() * bytes() * callable() * chr() * classmethod() * compile() * complex() * delattr() * dict() * dir() divmod() * enumerate() * eval() * exec() * filter() * float() * format() * frozenset() * getattr() * globals() * hasattr() * hash() * help() hex() * id() * input() int() * isinstance() * issubclass() * iter() * len() * list() * locals() * map() * max() * memoryview() * min() * next() * object() * oct() * open() ord() * pow() * print() property() * range() * repr() * reversed() * round() * set() * setattr() * slice() * sorted() * staticmethod() * str() * sum() * super() * tuple() * type() * vars() * zip() * -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2011/1/3 Alex Gaynor <alex.gaynor@gmail.com>
That's your opinion, but I have very different ideas. Of course we can't leave the problem only on the compiler shoulders, but I think that can be ways to threat builtins as "static" variables, and globals like local (fast) variables too, taking into account changes on the builtins' and modules dictionaries. But it doesn't make sense to invest time in these things: JITs are becoming a good alternative, and may be they will be ready soon to take the CPython place as the "mainstream" implementation. Cesare
participants (22)
-
"Martin v. Löwis"
-
Alex Gaynor
-
Barry Warsaw
-
Benjamin Peterson
-
Cesare Di Mauro
-
Daniel Stutzbach
-
David Malcolm
-
Ethan Furman
-
Georg Brandl
-
Glyph Lefkowitz
-
Guido van Rossum
-
Lukas Lueg
-
Maciej Fijalkowski
-
Michael Foord
-
Ned Batchelder
-
Neil Schemenauer
-
Nick Coghlan
-
Reid Kleckner
-
skip@pobox.com
-
Steven D'Aprano
-
Terry Reedy
-
Victor Stinner