Mailman 3 Optimizing builtins - Python-ideas

newer
read-only attributes

Optimizing builtins

older
Re: [Python-ideas] Optimizing...

Guido van Rossum

Dec. 31, 2010

6:49 p.m.

[Changed subject *and* list]

...

2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...
How do you know that range is a builtin you're thinking about and not some other object?

On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...

By a special opcode which could do this work. ]:-)

That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics. For example, I could have replaced builtins['range'] with something else; or I could have inserted a variable named 'range' into the module's __dict__. (Note that I am not talking about just creating a global variable named 'range' in the module; those the compiler could recognize. I am talking about interceptions that a compiler cannot see, assuming it compiles each module independently, i.e. without whole-program optimizations.) Now, *in practice* such manipulations are rare (with the possible exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly. I should also refer people interested in this subject to at least three PEPs that were written about this topic: PEP 266, PEP 267 and PEP 280. All three have been deferred, since nobody was bold enough to implement at least one of them well enough to be able to tell if it was even worth the trouble. I haven't read either of those in a long time, and they may well be outdated by current JIT technology. I just want to warn folks that it's not such a simple matter to replace "for i in range(....):" with a special opcode. (FWIW, optimizing "x[i] = i" would be much simpler -- I don't really care about the argument that a debugger might interfere. But again, apart from the simplest cases, it requires a sophisticated parser to determine that it is really safe to do so.) -- --Guido van Rossum (python.org/~guido)

Show replies by date

Michael Foord

December 2010

7:59 p.m.

On 31 December 2010 18:49, Guido van Rossum <guido@python.org> wrote:

...

[Changed subject *and* list]

...
2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...
How do you know that range is a builtin you're thinking about and not some other object?

On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...
By a special opcode which could do this work. ]:-)

That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics. For example, I could have replaced builtins['range'] with something else; or I could have inserted a variable named 'range' into the module's __dict__. (Note that I am not talking about just creating a global variable named 'range' in the module; those the compiler could recognize. I am talking about interceptions that a compiler cannot see, assuming it compiles each module independently, i.e. without whole-program optimizations.)

Now, *in practice* such manipulations are rare

Actually range is the one I've seen *most* overridden, not in order to replace functionality but because range is such a useful (or relevant) variable name in all sorts of circumstances... Michael

...

(with the possible exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly.

I should also refer people interested in this subject to at least three PEPs that were written about this topic: PEP 266, PEP 267 and PEP 280. All three have been deferred, since nobody was bold enough to implement at least one of them well enough to be able to tell if it was even worth the trouble. I haven't read either of those in a long time, and they may well be outdated by current JIT technology. I just want to warn folks that it's not such a simple matter to replace "for i in range(....):" with a special opcode.

(FWIW, optimizing "x[i] = i" would be much simpler -- I don't really care about the argument that a debugger might interfere. But again, apart from the simplest cases, it requires a sophisticated parser to determine that it is really safe to do so.)

-- --Guido van Rossum (python.org/~guido <http://python.org/%7Eguido>) _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Guido van Rossum

9:51 p.m.

On Fri, Dec 31, 2010 at 11:59 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...

On 31 December 2010 18:49, Guido van Rossum <guido@python.org> wrote:

...
[Changed subject *and* list]

...
2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...
How do you know that range is a builtin you're thinking about and not some other object?

On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...
By a special opcode which could do this work. ]:-)

That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics. For example, I could have replaced builtins['range'] with something else; or I could have inserted a variable named 'range' into the module's __dict__. (Note that I am not talking about just creating a global variable named 'range' in the module; those the compiler could recognize. I am talking about interceptions that a compiler cannot see, assuming it compiles each module independently, i.e. without whole-program optimizations.)

Now, *in practice* such manipulations are rare

Actually range is the one I've seen *most* overridden, not in order to replace functionality but because range is such a useful (or relevant) variable name in all sorts of circumstances...

No, you're misunderstanding. I was not referring to the overriding a name using Python's regular syntax for defining names. If you set a (global or local) variable named 'range', the compiler is perfectly capable of noticing. E.g.: range = 42 def foo(): for i in range(10): print(i) While this will of course fail with a TypeError if you try to execute it, a (hypothetical) optimizing compiler would have no trouble noticing that the 'range' in the for-loop must refer to the global variable of that name, not to the builtin of the same name. I was referring to an innocent module containing a use of the builtin range function, e.g. # a.py def f(): for i in range(10): print(i) which is imported by another module which manipulates a's globals, for example: # b.py import a a.range = 42 a.f() The compiler has no way to notice this when a.py is being compiled. Variants of "hiding" a mutation like this include: a.__dict__['range'] = 42 or import builtins builtins.range = 42 and of course for more fun you can make it more dynamic (think obfuscated code contests). -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

January 2011

12:43 a.m.

On Sat, Jan 1, 2011 at 7:51 AM, Guido van Rossum <guido@python.org> wrote:

...

and of course for more fun you can make it more dynamic (think obfuscated code contests).

Not to mention the champions of obfuscation for CPython: doing the same things from an extension module, or by using ctypes to invoke the C API (although such mechanisms are obviously outside the language definition itself, they're still technically legal for non-portable CPython code) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

1:17 a.m.

On Fri, Dec 31, 2010 at 4:43 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Sat, Jan 1, 2011 at 7:51 AM, Guido van Rossum <guido@python.org> wrote:

...
and of course for more fun you can make it more dynamic (think obfuscated code contests).

Not to mention the champions of obfuscation for CPython: doing the same things from an extension module, or by using ctypes to invoke the C API (although such mechanisms are obviously outside the language definition itself, they're still technically legal for non-portable CPython code)

Hm. I wouldn't even call such things "legal" -- rather accidents of the implementation. If someone depended on such an effect, and we changed things to make that no longer work, good luck arguing that we violated a compatibility promise. -- --Guido van Rossum (python.org/~guido)

Benjamin Peterson

8:35 p.m.

Guido van Rossum <guido@...> writes:

...

The compiler has no way to notice this when a.py is being compiled.

You could still optimize it if you insert a runtime "guard" before the range usage and see if its been overridden.

Guido van Rossum

8:37 p.m.

On Sat, Jan 1, 2011 at 12:35 PM, Benjamin Peterson <benjamin@python.org> wrote:

...

Guido van Rossum <guido@...> writes:

...
The compiler has no way to notice this when a.py is being compiled.

You could still optimize it if you insert a runtime "guard" before the range usage and see if its been overridden.

Yeah, that was Cesare's idea. I think that's a great strategy for a JIT compiler, but not appropriate for bytecode (IMO). -- --Guido van Rossum (python.org/~guido)

Stefan Behnel

9:10 a.m.

Benjamin Peterson, 01.01.2011 21:35:

...

Guido van Rossum<guido@...> writes:

...
The compiler has no way to notice this when a.py is being compiled.

You could still optimize it if you insert a runtime "guard" before the range usage and see if its been overridden.

The problem here is that you wouldn't save the lookup. So you'd still pay a high price to find out that the builtin has not been overridden. There can be substantial savings for builtins that can be optimised away or replaced by a tighter/adapted implementation. We do that a lot in Cython where builtins are (by default) considered static unless redefined inside of the module. An important example are generator expressions like "any(genexpr)". If the function was known to be builtin at compile time, CPython could generate much simpler byte code for these, dropping the need for a generator and its closure. But as long as you have to check for an override at each call, you end up with the duplicated code (optimised and fall-back version) and an increased entry overhead that may well kill the savings. Stefan

Michael Foord

2:33 p.m.

On 31/12/2010 21:51, Guido van Rossum wrote:

...

On Fri, Dec 31, 2010 at 11:59 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...
On 31 December 2010 18:49, Guido van Rossum<guido@python.org> wrote:

...
[Changed subject *and* list]

...
...
How do you know that range is a builtin you're thinking about and not some other object? On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote: By a special opcode which could do this work. ]:-) That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics. For example, I could have replaced builtins['range'] with something else; or I could have inserted a variable named 'range' into the module's __dict__. (Note

2010/12/31 Maciej Fijalkowski<fijall@gmail.com> that I am not talking about just creating a global variable named 'range' in the module; those the compiler could recognize. I am talking about interceptions that a compiler cannot see, assuming it compiles each module independently, i.e. without whole-program optimizations.)

Now, *in practice* such manipulations are rare Actually range is the one I've seen *most* overridden, not in order to replace functionality but because range is such a useful (or relevant) variable name in all sorts of circumstances...

No, you're misunderstanding. I was not referring to the overriding a name using Python's regular syntax for defining names. If you set a (global or local) variable named 'range', the compiler is perfectly capable of noticing. E.g.:

range = 42 def foo(): for i in range(10): print(i)

Right, in the same way the compiler notices local and global variable use and compiles different bytecode for lookups. It's just that accidentally overriding range is the source of my favourite "confusing Python error messages" story and I look for any opportunity to repeat it. A few years ago I worked for a company where most of the (very talented) developers were new to Python. They called me over to explain what "UnboundLocalError" meant and why they were getting it in what looked (to them) like perfectly valid code. The code looked something like: def something(start, stop): positions = range(start, stop) # more code here... range = process(positions) All the best, Michael Foord

...

While this will of course fail with a TypeError if you try to execute it, a (hypothetical) optimizing compiler would have no trouble noticing that the 'range' in the for-loop must refer to the global variable of that name, not to the builtin of the same name.

I was referring to an innocent module containing a use of the builtin range function, e.g.

# a.py def f(): for i in range(10): print(i)

which is imported by another module which manipulates a's globals, for example:

# b.py import a a.range = 42 a.f()

The compiler has no way to notice this when a.py is being compiled.

Variants of "hiding" a mutation like this include:

a.__dict__['range'] = 42

or

import builtins builtins.range = 42

and of course for more fun you can make it more dynamic (think obfuscated code contests).

Guido van Rossum

6:09 p.m.

On Mon, Jan 3, 2011 at 6:33 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...

A few years ago I worked for a company where most of the (very talented) developers were new to Python. They called me over to explain what "UnboundLocalError" meant and why they were getting it in what looked (to them) like perfectly valid code. The code looked something like:

def something(start, stop): positions = range(start, stop)

# more code here...

range = process(positions)

Yeah, and the really annoying thing for us old-timers is that this used to work (in Python 1.0 or so :-). Once upon a time, looking up locals was as dynamic as looking up globals and builtins is today. Still, I think for optimizing builtins we can do slightly better. -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

12:50 a.m.

On Sat, Jan 1, 2011 at 4:49 AM, Guido van Rossum <guido@python.org> wrote:

...

(FWIW, optimizing "x[i] = i" would be much simpler -- I don't really care about the argument that a debugger might interfere. But again, apart from the simplest cases, it requires a sophisticated parser to determine that it is really safe to do so.)

Back on topic, we've certainly made much bigger bytecode changes that would appear differently in a debugger. Collapsing most of the with statement entry overhead into the single SETUP_WITH opcode is the biggest recent(-ish) example that comes to mind. A more general peephole optimisation that picks up a repeated load operation in a sequence of load commands and replaces it with a single load and some stack rotations may be feasible, but I'm not entirely sure that would actually be an optimisation (especially for LOAD_FAST) - reordering the stack may be slower than the load operation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Steven D'Aprano

1:52 a.m.

Guido van Rossum wrote:

...

[Changed subject *and* list]

...
2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...
How do you know that range is a builtin you're thinking about and not some other object?

On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...
By a special opcode which could do this work. ]:-)

That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics.

Just tossing ideas out here... pardon me if they've been discussed before, but I read the three PEPs you mentioned later (266, 267 and 280) and they didn't cover any of this. I wonder whether we need to make that guarantee? Perhaps we should distinguish between "safe" optimizations, like constant folding which can't change behaviour, and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot. There's precedence for this. Both -O and -OO optimization switches potentially change behaviour. -O *should* be safe if code only uses asserts for assertions, but many people (especially beginners) use assert for input checking. If their code breaks under -O they have nobody to blame but themselves. Might we not say that -OO will optimize access to builtins, and if things break, the solution is not to use -OO? [...]

...

Now, *in practice* such manipulations are rare (with the possible exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly.

Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely: def f(): for i in builtin.range(10): builtin.print(i) while this would keep the current semantics: def f(): for i in range(10): print(i) -- Steven

Cesare Di Mauro

9:52 a.m.

2011/1/1 Steven D'Aprano <steve@pearwood.info>

...

I wonder whether we need to make that guarantee? Perhaps we should distinguish between "safe" optimizations, like constant folding which can't change behaviour, and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot.

Do we consider local variable removing (due to internal optimizations) a safe or unsafe operation? Do we consider local variable values "untouchable"? Think about a locals() call that return a list for a variable; lists are mutable objects, so they can be changed by the caller, but the internally generated bytecode can work on a "private" (on stack) copy which doesn't "see" the changes made due to the locals() call. Also, there's the tracing to consider. When trace is enabled, the "handler" cannot find some variables due to some optimizations. Another funny thing that can happen is that if I "group together" some assignment operations into a single, "multiassignment", one (it's another optimization I was thinking about from long time) and you are tracing it, only one tracing event will be generated instead of n. Are such optimizations "legal" / "safe"? For me the answer is yes, because I think that they must be implementation-specific.

...

Now, *in practice* such manipulations are rare (with the possible

...
exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly.

Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely:

def f(): for i in builtin.range(10): builtin.print(i)

while this would keep the current semantics:

def f(): for i in range(10): print(i)

-- Steven

I think that it's not needed. Optimizations must stay behind the scene. We can speedup the code which makes use of builtins without resorting to language changes. JITs already do this, but some ways are possible even on non-JITed VMs. However, they require a longer parse / compile time, which can undesirable. Cesare

Guido van Rossum

4:59 p.m.

On Sat, Jan 1, 2011 at 1:52 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...

2011/1/1 Steven D'Aprano <steve@pearwood.info>

...
I wonder whether we need to make that guarantee? Perhaps we should distinguish between "safe" optimizations, like constant folding which can't change behaviour, and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot.

Do we consider local variable removing (due to internal optimizations) a safe or unsafe operation?

I would consider it safe unless the function locals() is called directly in the function -- always assuming the compiler does not see obvious other evidence (like a local being used by a nested function). Locals are "special" in many ways already. There should be a way to disable this (globally) in case you want to step through the code with a debugger though -- IDEs like WingIDE and PyCharm make stepping through code very easy to set up, and variable inspection is a big part of the process of debugging this way. It's probably fine if such optimizations are only enabled by -O. Still, I wonder if it isn't much better to try to do this using a JIT instead of messing with the bytecode. You'll find that the current compiler just really doesn't have the datastructures needed to do these kind of optimizations right.

...

Do we consider local variable values "untouchable"? Think about a locals() call that return a list for a variable; lists are mutable objects, so they can be changed by the caller, but the internally generated bytecode can work on a "private" (on stack) copy which doesn't "see" the changes made due to the locals() call.

Are you sure? locals() makes only a shallow copy, so changes to the list's contents made via the list returned by locals() should be completely visible by the bytecode.

...

Also, there's the tracing to consider. When trace is enabled, the "handler" cannot find some variables due to some optimizations.

Tracing is a special case of debugging.

...

Another funny thing that can happen is that if I "group together" some assignment operations into a single, "multiassignment", one (it's another optimization I was thinking about from long time) and you are tracing it, only one tracing event will be generated instead of n. Are such optimizations "legal" / "safe"? For me the answer is yes, because I think that they must be implementation-specific.

I've traced through C code generated by gcc with an optimization setting. It can be a bit confusing to be jumping around in the optimized code, and it's definitely easier to trace through unoptimized code, but if you have the choice between tracing the (optimized) binary you have, or not tracing at all, I'll take what I can get. Still, when you're planning to trace/debug it's better to have a flag to disable it, and not using -O sounds like the right thing to me. -- --Guido van Rossum (python.org/~guido)

Cesare Di Mauro

6:22 a.m.

2011/1/1 Guido van Rossum <guido@python.org>

...

On Sat, Jan 1, 2011 at 1:52 AM, Cesare Di Mauro

...
Do we consider local variable removing (due to internal optimizations) a safe or unsafe operation?

I would consider it safe unless the function locals() is called directly in the function -- always assuming the compiler does not see obvious other evidence (like a local being used by a nested function).

The problem here is that locals is a builtin function, not a keyword, so the compiler must resort to something like the "code fork" that I showed before, if we want to keep the correct language semantic.

...

Still, I wonder if it isn't much better to try to do this using a JIT instead of messing with the bytecode.

Ditto for me, if a good (and not resource hungry) JIT will come.

...

You'll find that the current compiler just really doesn't have the datastructures needed to do these kind of optimizations right.

Right. Not now, but something can be made if and only if it makes sense. A JIT can make it non-sense, of course.

...

...
call that return a list for a variable; lists are mutable objects, so

Do we consider local variable values "untouchable"? Think about a locals() they

...
can be changed by the caller, but the internally generated bytecode can work on a "private" (on stack) copy which doesn't "see" the changes made due to the locals() call.

Are you sure? locals() makes only a shallow copy, so changes to the list's contents made via the list returned by locals() should be completely visible by the bytecode.

...

--Guido van Rossum (python.org/~guido)

Nice to know it. Reading from the doc ( http://docs.python.org/library/functions.html#locals ) it was not clear for me. Thanks. Cesare

spir

2:43 p.m.

On Sat, 01 Jan 2011 12:52:54 +1100 Steven D'Aprano <steve@pearwood.info> wrote:

...

...
Now, *in practice* such manipulations are rare (with the possible exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly.

Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely:

def f(): for i in builtin.range(10): builtin.print(i)

while this would keep the current semantics:

def f(): for i in range(10): print(i)

I had a similar question in a different context (non-python). The point was to prevent core semantic changes in a "pedagogic" mode, such as the sense of an operator on a builtin type. Eg have Real.sum 'untouchable' so that 1.1+2.2 returns what is expected. Instead of protected kywords, my thought went toward read-only (immutable?) containers, where 'container' is a very lose notion including types and scopes that hold them (and even individual objects that can be generated without type). Denis -- -- -- -- -- -- -- vit esse estrany ☣ spir.wikidot.com

Guido van Rossum

4:50 p.m.

On Fri, Dec 31, 2010 at 5:52 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Guido van Rossum wrote:

...
That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics.

Just tossing ideas out here... pardon me if they've been discussed before, but I read the three PEPs you mentioned later (266, 267 and 280) and they didn't cover any of this.

I wonder whether we need to make that guarantee? Perhaps we should distinguish between "safe" optimizations, like constant folding which can't change behaviour,

(Though notice that our historic track record indicates that they are very dangerous -- we've introduced subtle bugs several times in "trivial" constant folding optimizations.)

...

and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot.

There's precedence for this. Both -O and -OO optimization switches potentially change behaviour. -O *should* be safe if code only uses asserts for assertions, but many people (especially beginners) use assert for input checking. If their code breaks under -O they have nobody to blame but themselves. Might we not say that -OO will optimize access to builtins, and if things break, the solution is not to use -OO?

Maybe. But that means it will probably rarely be used -- realistically, who uses -O or -OO? I don't ever. Even so, there would have to be a way to turn the optimization off even under -OO for a particular module or function or code location, or for a particular builtin (again, open() comes to mind).

...

Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely:

def f(): for i in builtin.range(10): builtin.print(i)

while this would keep the current semantics:

def f(): for i in range(10): print(i)

That defaults the wrong way. You want the optimization to work (if the compiler does not see explicit reasons to avoid it) except in rare cases where the compiler cannot know that you're dynamically modifying the ennvironment (globals or builtins). Also I would very much worry that people would start putting this in everywhere out of a mistaken defensive attitude. (Like what has happened to certain micro-optimization patterns, which are being overused, making code less readable.) -- --Guido van Rossum (python.org/~guido)

Stefan Behnel

9:38 a.m.

Guido van Rossum, 01.01.2011 17:50:

...

On Fri, Dec 31, 2010 at 5:52 PM, Steven D'Aprano wrote:

...
and "unsafe" optimizations which can go wrong under (presumably) rare circumstances. The compiler can continue to apply whatever safe optimizations it likes, but unsafe optimizations must be explicitly asked for by the user. If subtle or not subtle bugs occur, well, Python does allow people to shoot themselves in the foot.

There's precedence for this. Both -O and -OO optimization switches potentially change behaviour. -O *should* be safe if code only uses asserts for assertions, but many people (especially beginners) use assert for input checking. If their code breaks under -O they have nobody to blame but themselves. Might we not say that -OO will optimize access to builtins, and if things break, the solution is not to use -OO?

Maybe. But that means it will probably rarely be used -- realistically, who uses -O or -OO? I don't ever. Even so, there would have to be a way to turn the optimization off even under -OO for a particular module or function or code location, or for a particular builtin (again, open() comes to mind).

If this ever happes, -O and -OO will no longer be expressive enough (IMHO, -OO currently isn't anyway). There would be a need to support options like "-Ostatic-builtins" and the like. The problem then is how to keep users from applying a particular optimisation to a particular module. New settings in distutils could help to enable optimisations and maybe even to explicitly forbid optimisations, but life would certainly become more error prone for distributors and users. It's hard to keep track of the amount of bug reports and help requests we get from (mostly new) Cython users about a missing "-fno-strict-aliasing" when compiled modules don't work in Python 2. I expect the same to come up when users start to install Python modules with all sorts of great CPython optimisations. Test suites may well fail to catch the one bug that an optimisation triggers.

...

...
Here's another thought... suppose (say) "builtin" became a reserved word. builtin.range (for example) would always refer to the built-in range, and could be optimized by the compiler. It wouldn't do much for the general case of wanting to optimize non-built-in globals, but this could be optimized safely:

def f(): for i in builtin.range(10): builtin.print(i)

while this would keep the current semantics:

def f(): for i in range(10): print(i)

That defaults the wrong way.

My impression exactly, so I'm -1. But the trade-off behind this is: complicating new code versus breaking old code (Python 3 classic). Stefan

Carl M. Johnson

9:46 a.m.

On Sat, Jan 1, 2011 at 11:38 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:

...

If this ever happes, -O and -OO will no longer be expressive enough (IMHO, -OO currently isn't anyway). There would be a need to support options like "-Ostatic-builtins" and the like. The problem then is how to keep users from applying a particular optimisation to a particular module. New settings in distutils could help to enable optimisations and maybe even to explicitly forbid optimisations, but life would certainly become more error prone for distributors and users. It's hard to keep track of the amount of bug reports and help requests we get from (mostly new) Cython users about a missing "-fno-strict-aliasing" when compiled modules don't work in Python 2. I expect the same to come up when users start to install Python modules with all sorts of great CPython optimisations. Test suites may well fail to catch the one bug that an optimisation triggers.

If a flag wouldn't work, what about a pragma? Pragma smell a bit unpythonic to me, but we did have a pragma for source encoding and unicode literals in Python 2, so it's not unprecedented. How much would it solve efficiency-wise if you could just write at the top of a particular module ##ORIGINAL BUILTINS ONLY PLEASE ? Or is this the first step on the dark path to Perl 6? -- Carl Johnson

Guido van Rossum

4:10 p.m.

...

If a flag wouldn't work, what about a pragma? Pragma smell a bit unpythonic to me, but we did have a pragma for source encoding and unicode literals in Python 2, so it's not unprecedented. How much would it solve efficiency-wise if you could just write at the top of a particular module ##ORIGINAL BUILTINS ONLY PLEASE ? Or is this the first step on the dark path to Perl 6?

Again, this would encourage people to put such junk in every module they write, so it would lose its value. At this point in the thread I am tempted to propose an optimization moratorium, just to stop the flood of poorly-thought-through proposals. If you really want to make Python faster, don't waste your time in this thread. Go contribute to PyPy or Unladen Swallow. Or go fix the GIL, so we can use multiple cores. -- --Guido van Rossum (python.org/~guido)

Cesare Di Mauro

9:32 a.m.

2010/12/31 Guido van Rossum <guido@python.org>

...

[Changed subject *and* list]

...
2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...
How do you know that range is a builtin you're thinking about and not some other object?

On Fri, Dec 31, 2010 at 7:02 AM, Cesare Di Mauro <cesare.di.mauro@gmail.com> wrote:

...
By a special opcode which could do this work. ]:-)

That can't be the answer, because then the question would become "how does the compiler know it can use the special opcode". This particular issue (generating special opcodes for certain builtins) has actually been discussed many times before. Alas, given Python's extremely dynamic promises it is very hard to do it in a way that is *guaranteed* not to change the semantics. For example, I could have replaced builtins['range'] with something else; or I could have inserted a variable named 'range' into the module's __dict__. (Note that I am not talking about just creating a global variable named 'range' in the module; those the compiler could recognize. I am talking about interceptions that a compiler cannot see, assuming it compiles each module independently, i.e. without whole-program optimizations.)

Yes, I know it, but the special opcode which I was talking about has a very different usage. The primary goal was to speed-up fors, generating specialized code when the proper range builtin is found at runtime, and it's convenient to have such optimized code. As you stated, the compiler don't know if range is a builtin until runtime (at the precise moment of for execution), so it'll generate two different code paths. The function's bytecode will look like that: 0 SETUP_LOOP 62 2 JUMP_IF_BUILTIN_OR_LOAD_GLOBAL 'range', 40 # Usual, slow, code starts here 40 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 44 LOAD_FAST_MORE_TIMES x, 5 # Loads x 5 times on the stack 46 LOAD_CONSTS (4, 3, 2, 1, 0) # Loads the tuple on the stack 48 STACK_ZIP 3, 5 # "zips" 3 sequences of 5 elements each on the stack 52 STORE_SUBSCR 54 STORE_SUBSCR 56 STORE_SUBSCR 58 STORE_SUBSCR 60 STORE_SUBSCR 62 POP_BLOCK 64 RETURN_CONST 'None' It's just an example; cde can be different based on the compiler optimizations and opcodes available in the virtual machine. The most important thing is that the semantic will be preserved (I never intended to drop it! ;)

...

Now, *in practice* such manipulations are rare (with the possible exception of people replacing open() with something providing hooks for e.g. a virtual filesystem) and there is probably some benefit to be had. (I expect that the biggest benefit might well be from replacing len() with an opcode.) I have in the past proposed to change the official semantics of the language subtly to allow such optimizations (i.e. recognizing builtins and replacing them with dedicated opcodes). There should also be a simple way to disable them, e.g. by setting "len = len" at the top of a module, one would be signalling that len() is not to be replaced by an opcode. But it remains messy and nobody has really gotten very far with implementing this. It is certainly not "low-hanging fruit" to do it properly.

I should also refer people interested in this subject to at least three PEPs that were written about this topic: PEP 266, PEP 267 and PEP 280. All three have been deferred, since nobody was bold enough to implement at least one of them well enough to be able to tell if it was even worth the trouble.

I read them, and they are interesting, but my case is quite different.

...

I haven't read either of those in a long time, and they may well be outdated by current JIT technology. I just want to warn folks that it's not such a simple matter to replace "for i in range(....):" with a special opcode.

May be trying to optimize the current non-JITed Python implementation is a death binary. JITs are evolving so much that all the things we have discussed here are already took into account. (FWIW, optimizing "x[i] = i" would be much simpler -- I don't really

...

care about the argument that a debugger might interfere. But again, apart from the simplest cases, it requires a sophisticated parser to determine that it is really safe to do so.)

-- --Guido van Rossum (python.org/~guido)

It depends strictly by the goals we want to reach. A more advanced parser with a simple type-inference system can be implemented without requiring a complete parser rebuild, and can give good (albeit not optimal) results. At least lists, dictionaries, tuples, and sets operations, which are very common on Python, can benefit; something for ints, doubles and complexes can be done, also. But looking at the JITs it can be just lost time... Cesare

5164

Age (days ago)

5167

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

Benjamin Peterson
Carl M. Johnson
Cesare Di Mauro
Guido van Rossum
Michael Foord
Nick Coghlan
spir
Stefan Behnel
Steven D'Aprano

Optimizing builtins

spir

Carl M. Johnson

tags

participants (9)