Mailman 3 Possible optimization for LOAD_FAST ? - Python-Dev

Possible optimization for LOAD_FAST ?

older
Omit Py_buffer struct from Stable...

Lukas Lueg

28 Dec 2010 28 Dec '10

10:08 p.m.

Consider the following code: def foobar(x): for i in range(5): x[i] = i The bytecode in python 2.7 is the following: 2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i) 3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack? There is no way that the reference of i might change in between the two lines. Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

Show replies by date

Benjamin Peterson

28 Dec 28 Dec

10:24 p.m.

2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>:

...

Consider the following code:

def foobar(x): for i in range(5): x[i] = i

The bytecode in python 2.7 is the following:

2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i)

3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE

Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack?

Yes. Would it be useful? Unlikely. -- Regards, Benjamin

Georg Brandl

11:30 p.m.

Am 28.12.2010 18:24, schrieb Benjamin Peterson:

...

2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>:

...
Consider the following code:

def foobar(x): for i in range(5): x[i] = i

The bytecode in python 2.7 is the following:

2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i)

3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE

Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack?

Yes. Would it be useful? Unlikely.

Is it tricky to get all the corner cases right? Very probably :) Georg

"Martin v. Löwis"

29 Dec 29 Dec

6:20 p.m.

Am 28.12.2010 18:08, schrieb Lukas Lueg:

...

Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

That's not true; a debugger may change the value of x. Regards, Martin

Cesare Di Mauro

31 Dec 31 Dec

12:17 p.m.

2010/12/29 "Martin v. Löwis" <martin@v.loewis.de>

...

Am 28.12.2010 18:08, schrieb Lukas Lueg:

...
Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

That's not true; a debugger may change the value of x.

Regards, Martin

OK, but is it mandatory? For example, in the above code, I can unroll the loop because I found that range is the usual built-in, 5 is a low-enough constant, and the body is made by a simple statement. Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) cannot even find something to change about it. And so on with other optimization examples that can be possible. Are they "legal" with Python? I think that we need to make it clear what happens in such cases. My idea is that it should be made implementation-specific. What happens with local variables and the generated code must depend on the specific compiler & virtual machine, in order to have a greater flexibility. IMHO the most important thing should be that, under normal conditions, the executed code have the expected behavior. Cesare

Maciej Fijalkowski

3 p.m.

...

OK, but is it mandatory? For example, in the above code, I can unroll the loop because I found that range is the usual built-in, 5 is a low-enough constant,

How do you know xrange is xrange and not something else? Cheers, fijal

Maciej Fijalkowski

3:01 p.m.

On Fri, Dec 31, 2010 at 12:00 PM, Maciej Fijalkowski <fijall@gmail.com> wrote:

...

...
OK, but is it mandatory? For example, in the above code, I can unroll the loop because I found that range is the usual built-in, 5 is a low-enough constant,

How do you know xrange is xrange and not something else?

Cheers, fijal

Err, misread. How do you know that range is a builtin you're thinking about and not some other object? Cheers, fijal

Cesare Di Mauro

8:02 p.m.

2010/12/31 Maciej Fijalkowski <fijall@gmail.com>

...

...
...
OK, but is it mandatory? For example, in the above code, I can unroll

On Fri, Dec 31, 2010 at 12:00 PM, Maciej Fijalkowski <fijall@gmail.com> wrote: the

...
...
loop because I found that range is the usual built-in, 5 is a low-enough constant,

How do you know xrange is xrange and not something else?

Cheers, fijal

Err, misread. How do you know that range is a builtin you're thinking about and not some other object?

Cheers, fijal

By a special opcode which could do this work. ]:-) Cesare

Ethan Furman

8:55 p.m.

Cesare Di Mauro wrote:

...

2010/12/29 "Martin v. Löwis" wrote:

...
Am 28.12.2010 18:08, schrieb Lukas Lueg:

...
Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

That's not true; a debugger may change the value of x.

Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) cannot even find something to change about it.

-1 Debugging is challenging enough as it is -- why would you want to make it even more difficult? ~Ethan~

skip＠pobox.com

9:36 p.m.

>> Another example. I can totally remove the variable i, just using the >> stack, so a debugger (or, in general, having the tracing enabled) >> cannot even find something to change about it. Ethan> -1 Ethan> Debugging is challenging enough as it is -- why would you want to Ethan> make it even more difficult? <snarky> I don't know. Maybe he wants his program to run faster. </snarky> If you use print statements for the bulk of your debugging (many people do), unrolling loops doesn't affect your debugging ability. Skip

Cesare Di Mauro

10:51 p.m.

2010/12/31 <skip@pobox.com>

...

...
...
Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) >> cannot even find something to change about it.

Ethan> -1

...

Ethan> Debugging is challenging enough as it is -- why would you want to Ethan> make it even more difficult?

<snarky> I don't know. Maybe he wants his program to run faster. </snarky>

:D "Aggressive" optimizations can be enabled with explicit options, in order to leave normal "debugger-prone" code.

...

If you use print statements for the bulk of your debugging (many people do), unrolling loops doesn't affect your debugging ability.

Skip

It's a common practice. Also IDEs helps a lot, and advanced interactive shells too (such as DreamPie). Cesare

Ned Batchelder

1 Jan 1 Jan

10:18 p.m.

...

2010/12/31 <skip@pobox.com <mailto:skip@pobox.com>>

>> Another example. I can totally remove the variable i, just using the >> stack, so a debugger (or, in general, having the tracing enabled) >> cannot even find something to change about it.

Ethan> -1

Ethan> Debugging is challenging enough as it is -- why would you want to Ethan> make it even more difficult?

<snarky> I don't know. Maybe he wants his program to run faster. </snarky>

:D

"Aggressive" optimizations can be enabled with explicit options, in order to leave normal "debugger-prone" code. I wish the Python compiler would adopt a strategy of being able to disable optimizations. I wrote a bug about a "leaky abstraction" optimization messing up coverage testing 2.5 years ago, and it was closed as won't fix: http://bugs.python.org/issue2506. The debate there

On 12/31/2010 12:51 PM, Cesare Di Mauro wrote: centered around, "but that line isn't executed, because it's been optimized away." It's common in sophisticated compilers (as in, any C compiler) to be able to choose whether you want optimizations for speed, or disabling optimizations for debugging and reasoning about the code. Python would benefit from the same choice. --Ned.

...

If you use print statements for the bulk of your debugging (many people do), unrolling loops doesn't affect your debugging ability.

Skip

It's a common practice. Also IDEs helps a lot, and advanced interactive shells too (such as DreamPie).

Cesare

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com

Cesare Di Mauro

3 Jan 3 Jan

11:45 a.m.

2011/1/1 Ned Batchelder <ned@nedbatchelder.com>

...

On 12/31/2010 12:51 PM, Cesare Di Mauro wrote:

"Aggressive" optimizations can be enabled with explicit options, in order to leave normal "debugger-prone" code.

I wish the Python compiler would adopt a strategy of being able to disable optimizations. I wrote a bug about a "leaky abstraction" optimization messing up coverage testing 2.5 years ago, and it was closed as won't fix: http://bugs.python.org/issue2506. The debate there centered around, "but that line isn't executed, because it's been optimized away." It's common in sophisticated compilers (as in, any C compiler) to be able to choose whether you want optimizations for speed, or disabling optimizations for debugging and reasoning about the code. Python would benefit from the same choice.

--Ned.

Command line parameters and/or environment variables are suitable for this, but they aren't immediate and, also, have global effect. I wish an explicit ("Explicit is better than implicit") and a finer control over optimizations, with a per-module usage: from __compiler__ import disable_peepholer, strict_syntax, static_builtins, globals_as_fasts Cesare

Cesare Di Mauro

31 Dec 31 Dec

10:42 p.m.

2010/12/31 Ethan Furman <ethan@stoneleaf.us>

...

Cesare Di Mauro wrote:

...
...
2010/12/29 "Martin v. Löwis" wrote:

...
...
Am 28.12.2010 18:08, schrieb Lukas Lueg:

...
...
Also, the load_fast in lne 22 to reference x could be taken out of the

...
...
loop as x will always point to the same object....

...
...
That's not true; a debugger may change the value of x.

Another example. I can totally remove the variable i, just using the stack, so a debugger (or, in general, having the tracing enabled) cannot even find something to change about it.

-1

Debugging is challenging enough as it is -- why would you want to make it even more difficult?

~Ethan~

With a good test suite you can forget debuggers. In more than 6 years of Python programming, I have used it only two times (to debug an ANTLR generated parser). Cesare

Victor Stinner

2 Jan 2 Jan

6:17 p.m.

Le mercredi 29 décembre 2010 à 14:20 +0100, "Martin v. Löwis" a écrit :

...

Am 28.12.2010 18:08, schrieb Lukas Lueg:

...
Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

That's not true; a debugger may change the value of x.

That's why Python has the following option: -O : optimize generated bytecode slightly; also PYTHONOPTIMIZE=x I regulary recompile programs with gcc -O0 -g to debug them. It is very difficult to debug (with gdb) a program compiled with gcc -O2: many variables are stored in registers, and gdb doesn't support that correctly. Victor

Ned Batchelder

3 Jan 3 Jan

6:25 a.m.

On 1/2/2011 8:17 AM, Victor Stinner wrote:

...

Le mercredi 29 décembre 2010 à 14:20 +0100, "Martin v. Löwis" a écrit :

...
Am 28.12.2010 18:08, schrieb Lukas Lueg:

...
Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object.... That's not true; a debugger may change the value of x. That's why Python has the following option:

-O : optimize generated bytecode slightly; also PYTHONOPTIMIZE=x

I regulary recompile programs with gcc -O0 -g to debug them. It is very difficult to debug (with gdb) a program compiled with gcc -O2: many variables are stored in registers, and gdb doesn't support that correctly.

Victor, you seem to be equating the gcc -O flag with the Python -O flag. They are described similarly, but can't be used the same way. In particular, there is no Python equivalent to gcc's -O0: there is no way to disable the Python peephole optimizer. --Ned.

...

Victor

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ned%40nedbatchelder.com

Cesare Di Mauro

31 Dec 31 Dec

12:02 p.m.

2010/12/28 Lukas Lueg <lukas.lueg@googlemail.com>

...

Consider the following code:

def foobar(x): for i in range(5): x[i] = i

The bytecode in python 2.7 is the following:

2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i)

3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE

Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack? There is no way that the reference of i might change in between the two lines. Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will always point to the same object....

Yes, you can, but you need: - a better AST evaluator (to mark symbols/variables with proper attributes); - a better optimizer (usually located on compile.c) which has a "global vision" (not limited to single instructions and/or single expressions). It's not that simple, and the results aren't guaranteed to be good. Also, consider that Python, as a dynamic-and-not-statically-compiled language need to find a good trade-off between compilation time and execution. Just to be clear, a C program is usually compiled once, then executed, so you can spend even *hours* to better optimize the final binary code. With a dynamic language, usually the code is compiled and the executed as needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem). Python stays in a "gray area", because modules are usually compiled once (when they are first used), and executed many times, but it isn't the only case. You cannot assume that optimization techniques used on other (static) languages can be used/ported in Python. Cesare

Alex Gaynor

3 Jan 3 Jan

6:50 a.m.

New subject: [Python-Dev] Possible optimization for LOAD_FAST ?

Cesare Di Mauro <cesare.di.mauro <at> gmail.com> writes:

...

2010/12/28 Lukas Lueg <lukas.lueg <at> googlemail.com>

Consider the following code: def foobar(x): for i in range(5): x[i] = i The bytecode in python 2.7 is the following: 2 0 SETUP_LOOP 30 (to 33) 3 LOAD_GLOBAL 0 (range) 6 LOAD_CONST 1 (5) 9 CALL_FUNCTION 1 12 GET_ITER >> 13 FOR_ITER 16 (to 32) 16 STORE_FAST 1 (i) 3 19 LOAD_FAST 1 (i) 22 LOAD_FAST 0 (x) 25 LOAD_FAST 1 (i) 28 STORE_SUBSCR 29 JUMP_ABSOLUTE 13 >> 32 POP_BLOCK >> 33 LOAD_CONST 0 (None) 36 RETURN_VALUE Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load and put the reference twice on the stack? There is no way that the reference of i might change in between the two lines. Also, the load_fast in lne 22 to reference x could be taken out of the loop as x will

always point to the same object....

...

Yes, you can, but you need: - a better AST evaluator (to mark symbols/variables with proper attributes); - a better optimizer (usually located on compile.c) which has a "global

vision" (not limited to single instructions and/or single expressions).

...

It's not that simple, and the results aren't guaranteed to be good.

Also, consider that Python, as a dynamic-and-not-statically-compiled language

need to find a good trade-off between compilation time and execution.

...

Just to be clear, a C program is usually compiled once, then executed, so you

can spend even *hours* to better optimize the final binary code.

...

With a dynamic language, usually the code is compiled and the executed as

needed, in "realtime". So it isn't practical neither desirable having to wait too much time before execution begins (the "startup" problem).

...

Python stays in a "gray area", because modules are usually compiled once (when

they are first used), and executed many times, but it isn't the only case.

...

You cannot assume that optimization techniques used on other (static)

languages can be used/ported in Python.

...

Cesare

No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong. Alex

Guido van Rossum

8:18 a.m.

On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:

...

No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.

True. My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization. But barring someone interested in implementing something based on this rule, the proposal has languished for many years. FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions). -- --Guido van Rossum (python.org/~guido)

Terry Reedy

10:36 a.m.

On 1/2/2011 10:18 PM, Guido van Rossum wrote:

...

My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

I believe this amounts to saying 1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today. 2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals. It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name. For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass. So it could not work interactively. Should batch mode main modules work the same as when imported? Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overriden, also get resolved when the function is compiled. -- Terry Jan Reedy

Nick Coghlan

12:28 p.m.

On Mon, Jan 3, 2011 at 3:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

I believe this amounts to saying

1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today.

2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals.

It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name.

For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass. So it could not work interactively. Should batch mode main modules work the same as when imported?

Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overriden, also get resolved when the function is compiled.

This could potentially be handled by having the "exec" mode in compile() assume it can see all the global assignments (and hence assume builtin names refer to the builtins), while "single" would assume this was not the case (and hence skip the optimisation). It may also need an additional parameter to tell the compiler which names are known to be visible in the current locals and globals (e.g. to allow exec() to do the right thing) This kind of issue is the reason Guido pointed out the idea really needs someone else to pick it up and run with it - not just to figure out the implementation details, but also to ferret out the full implications for the language semantics and backwards compatibility. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Guido van Rossum

8:58 p.m.

On Sun, Jan 2, 2011 at 9:36 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

On 1/2/2011 10:18 PM, Guido van Rossum wrote:

...
My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

I believe this amounts to saying

1) Python code executes in three scopes (rather than two): global builtin, modular (misleadingly call global), and local. This much is a possible viewpoint today.

In fact it is the specification today.

...

2) A name that is not an assignment target anywhere -- and that matches a builtin name -- is treated as a builtin. This is the new part, and it amounts to a rule for entire modules that is much like the current rule for separating local and global names within a function. The difference from the global/local rule would be that unassigned non-builtin names would be left to runtime resolution in globals.

It would seem that this new rule would simplify the lookup of module ('global') names since if xxx in not in globals, there is no need to look in builtins. This is assuming that following 'len=len' with 'del len' cannot 'unmodularize' the name.

Actually I would leave the lookup mechanism for names that don't get special treatment the same -- the only difference would be for builtins in contexts where the compiler can generate better code (typically involving a new opcode) based on all the conditions being met.

...

For the rule to work 'retroactively' within a module as it does within functions would require a similar preliminary pass.

We actually already do such a pass.

...

So it could not work interactively.

That's fine. We could also disable it automatically in when eval() or exec() is the source of the code.

...

Should batch mode main modules work the same as when imported?

Yes.

...

Interactive mode could work as it does at present or with slight modification, which would be that builtin names within functions, if not yet overridden, also get resolved when the function is compiled.

Interactive mode would just work as it does today. I would also make a rule saying that 'open' is not treated this way. It is the only one where I can think of legitimate reasons for changing the semantics dynamically in a way that is not detectable by the compiler, assuming it only sees the source code for one module at a time. Some things that could be optimized this way: len(x), isinstance(x, (int, float)), range(10), issubclass(x, str), bool(x), int(x), hash(x), etc... in general, the less the function does the better a target for this optimization it is. One more thing: to avoid heisenbugs, I propose that, for any particular builtin, if this optimization is used anywhere in a module, it is should be used everywhere in that module (except in scopes where the name has a different meaning). This means that we can tell users about it and they can observe it without too much of a worry that a slight change to their program might disable it. (I've seen this with optimizations in gcc, and it makes performance work tricky.) Still, it's all academic until someone implements some of the optimizations. (There rest of the work is all in the docs and in the users' minds.) -- --Guido van Rossum (python.org/~guido)

Glyph Lefkowitz

7:12 p.m.

On Jan 2, 2011, at 10:18 PM, Guido van Rossum wrote:

...

On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:

...
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.

True.

My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

But barring someone interested in implementing something based on this rule, the proposal has languished for many years.

Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful. If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.

Guido van Rossum

9:01 p.m.

On Mon, Jan 3, 2011 at 6:12 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:

...

Wouldn't this optimization break things like mocking out 'open' for testing via 'module.open = fakeopen'? I confess I haven't ever wanted to change 'len' but that one seems pretty useful.

I am explicitly excluding open from this optimization, for that very reason.

...

If CPython wants such optimizations, it should do what PyPy and its ilk do, which is to notice the assignment, but recompile code in that module to disable the fast path at runtime, preserving the existing semantics.

In general I am against duplicating bytecode -- it can blow up too much. (It is an entirely appropriate technique for JIT compilers -- but my point here is that bytecode is different.) Recompiling a module is not a trivial change -- for example, either code objects would have to become mutable, or we'd have to track down all the code objects and replace them. Neither sounds attractive to me. -- --Guido van Rossum (python.org/~guido)

David Malcolm

10:52 p.m.

On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:

...

On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:

...
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.

True.

My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

But barring someone interested in implementing something based on this rule, the proposal has languished for many years.

Is there a PEP for this?

...

FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions).

I've been attempting another way in: http://bugs.python.org/issue10399 using a new "JUMP_IF_SPECIALIZABLE" opcode. This compares what a value is against a compile-time prediction, branching to an optimized implementation if the guess was correct. I use this to implement function-call inlining within the generated bytecode. Caveat-of-doom: That code's very much a work-in-progress at this stage, though: sometimes it doesn't segfault :) and the way that I track the predicted values is taking some other liberties with semantics (see that URL and the dmalcolm-ast-optimization-branch in SVN). (There's probably at least 2 PEPs in the above idea, though have yet to write my first PEP) Hope this is helpful Dave

Guido van Rossum

4 Jan 4 Jan

6:02 a.m.

On Mon, Jan 3, 2011 at 9:52 AM, David Malcolm <dmalcolm@redhat.com> wrote:

...

On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:

...
On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor <alex.gaynor@gmail.com> wrote:

...
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.

True.

My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

But barring someone interested in implementing something based on this rule, the proposal has languished for many years.

Is there a PEP for this?

Not that I know of, otherwise I'd have mentioned it. :-) It would be useful if someone wrote it up, since the idea comes back in one form or another regularly.

...

...
FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions).

I've been attempting another way in: http://bugs.python.org/issue10399 using a new "JUMP_IF_SPECIALIZABLE" opcode. This compares what a value is against a compile-time prediction, branching to an optimized implementation if the guess was correct. I use this to implement function-call inlining within the generated bytecode.

Yeah, that's what everybody proposes to keep the language semantics unchanged. But I claim that an easier solution is to say to hell with those semantics, let's change them to make the implementation simpler. That's from the Zen of Python: "If the implementation is easy to explain, it may be a good idea." I guess few people can seriously propose to change Python's semantics, that's why *I* am proposing it. :-) Note that the semantics of locals (e.g. UnboundLocalError) were also changed specifically to allow a significant optimization -- again by me.

...

Caveat-of-doom: That code's very much a work-in-progress at this stage, though: sometimes it doesn't segfault :) and the way that I track the predicted values is taking some other liberties with semantics (see that URL and the dmalcolm-ast-optimization-branch in SVN).

(There's probably at least 2 PEPs in the above idea, though have yet to write my first PEP)

If you want to write up a PEP for the semantics change I am proposing, everything you need is in this thread. -- --Guido van Rossum (python.org/~guido)

Michael Foord

3:49 p.m.

On 04/01/2011 01:02, Guido van Rossum wrote:

...

On Mon, Jan 3, 2011 at 9:52 AM, David Malcolm<dmalcolm@redhat.com> wrote:

...
On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:

...
On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor<alex.gaynor@gmail.com> wrote:

...
No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong. True.

My proposed way out of this conundrum has been to change the language semantics slightly so that global names which (a) coincide with a builtin, and (b) have no explicit assignment to them in the current module, would be fair game for such optimizations, with the understanding that the presence of e.g. "len = len" anywhere in the module (even in dead code!) would be sufficient to disable the optimization.

But barring someone interested in implementing something based on this rule, the proposal has languished for many years. Is there a PEP for this? Not that I know of, otherwise I'd have mentioned it. :-)

It would be useful if someone wrote it up, since the idea comes back in one form or another regularly.

...
...
FWIW, this is reminiscent of Fortran's rules for "intrinsics" (its name for builtins), which have a similar optimization behavior (except there the potential overrides that the compiler doesn't need to take into account are load-time definitions). I've been attempting another way in: http://bugs.python.org/issue10399 using a new "JUMP_IF_SPECIALIZABLE" opcode. This compares what a value is against a compile-time prediction, branching to an optimized implementation if the guess was correct. I use this to implement function-call inlining within the generated bytecode. Yeah, that's what everybody proposes to keep the language semantics unchanged. But I claim that an easier solution is to say to hell with those semantics, let's change them to make the implementation simpler. That's from the Zen of Python: "If the implementation is easy to explain, it may be a good idea." I guess few people can seriously propose to change Python's semantics, that's why *I* am proposing it. :-) Note that the semantics of locals (e.g. UnboundLocalError) were also changed specifically to allow a significant optimization -- again by me.

I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests. Michael

...

...
Caveat-of-doom: That code's very much a work-in-progress at this stage, though: sometimes it doesn't segfault :) and the way that I track the predicted values is taking some other liberties with semantics (see that URL and the dmalcolm-ast-optimization-branch in SVN).

(There's probably at least 2 PEPs in the above idea, though have yet to write my first PEP) If you want to write up a PEP for the semantics change I am proposing, everything you need is in this thread.

-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Guido van Rossum

8:58 p.m.

On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...

I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this? -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

9:20 p.m.

On Wed, Jan 5, 2011 at 1:58 AM, Guido van Rossum <guido@python.org> wrote:

...

On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this?

I've seen it done to write tests for simple CLI behaviour by mocking input() and print() (replacing sys.stdin and sys.stdout instead is far more common, but replacing the functions works too). If compile() accepted a blacklist of builtins that it wasn't allowed to optimise, then that should deal with the core of the problem as far as testing goes. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Alex Gaynor

9:21 p.m.

On Tue, Jan 4, 2011 at 10:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, Jan 5, 2011 at 1:58 AM, Guido van Rossum <guido@python.org> wrote:

...
On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this?

I've seen it done to write tests for simple CLI behaviour by mocking input() and print() (replacing sys.stdin and sys.stdout instead is far more common, but replacing the functions works too). If compile() accepted a blacklist of builtins that it wasn't allowed to optimise, then that should deal with the core of the problem as far as testing goes.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ugh, I can't be the only one who finds these special cases to be a little nasty? Special cases aren't special enough to break the rules. Alex -- "I disapprove of what you say, but I will defend to the death your right to say it." -- Evelyn Beatrice Hall (summarizing Voltaire) "The people's good is the highest law." -- Cicero "Code can always be simpler than you think, but never as simple as you want" -- Me

Reid Kleckner

9:49 p.m.

On Tue, Jan 4, 2011 at 8:21 AM, Alex Gaynor <alex.gaynor@gmail.com> wrote:

...

Ugh, I can't be the only one who finds these special cases to be a little nasty? Special cases aren't special enough to break the rules. Alex

+1, I don't think nailing down a few builtins is that helpful for optimizing Python. Anyone attempting to seriously optimize Python is going to need to use more general techniques that apply to non-builtins as well. In unladen swallow (I'm sure something similar is done in PyPy) we have some infrastructure for watching dictionaries for changes, and in particular we tend to watch the builtins and module dictionaries. Reid

Barry Warsaw

9:54 p.m.

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:

...

Ugh, I can't be the only one who finds these special cases to be a little nasty?

Special cases aren't special enough to break the rules.

Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit. Outside of testing, I do agree that most of the builtins could be pretty safely optimized (even open()). There needs to be a way to stop all optimizations for testing purposes. Perhaps a sys variable, plus command line option and/or environment variable? -Barry

Michael Foord

9:57 p.m.

On 04/01/2011 16:54, Barry Warsaw wrote:

...

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:

...
Ugh, I can't be the only one who finds these special cases to be a little nasty?

Special cases aren't special enough to break the rules. Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit.

Outside of testing, I do agree that most of the builtins could be pretty safely optimized (even open()). There needs to be a way to stop all optimizations for testing purposes. Perhaps a sys variable, plus command line option and/or environment variable?

Although testing in an environment deliberately different from production is a recipe for hard to diagnose bugs. Michael

...

-Barry

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...

Lukas Lueg

10:33 p.m.

Doesnt this all boil down to being able to monitor PyDict for changes to it's key-space? The keys are immutable anyway so the instances of PyDict could manage a opaque value (in fact, a counter) that changes every time a new value is written to any key. Once we get a reference out of the dict, we can can do very fast lookups by passing the key, the reference we know from the last lookup and our last state. The lookup returns a new reference and the new state. If the dict has not changed, the state doesnt change and the reference is simply taken from the passed value passed to the lookup. That way the code remains the same no matter if the dict has changed or not. 2011/1/4 Michael Foord <fuzzyman@voidspace.org.uk>:

...

On 04/01/2011 16:54, Barry Warsaw wrote:

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:

Ugh, I can't be the only one who finds these special cases to be a little nasty?

Special cases aren't special enough to break the rules.

Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit.

Outside of testing, I do agree that most of the builtins could be pretty safely optimized (even open()). There needs to be a way to stop all optimizations for testing purposes. Perhaps a sys variable, plus command line option and/or environment variable?

Although testing in an environment deliberately different from production is a recipe for hard to diagnose bugs.

Michael

-Barry

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...

-- http://www.voidspace.org.uk/

May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lukas.lueg%40gmail.com

Daniel Stutzbach

5 Jan 5 Jan

3:15 a.m.

On Tue, Jan 4, 2011 at 9:33 AM, Lukas Lueg <lukas.lueg@googlemail.com>wrote:

...

The keys are immutable anyway so the instances of PyDict could manage a opaque value (in fact, a counter) that changes every time a new value is written to any key. Once we get a reference out of the dict, we can can do very fast lookups by passing the key, the reference we know from the last lookup and our last state. The lookup returns a new reference and the new state. If the dict has not changed, the state doesnt change and the reference is simply taken from the passed value passed to the lookup. That way the code remains the same no matter if the dict has changed or not.

I have had similar ideas in the past but have never found time to explore them. The same mechanism could also be used to speed up attribute access on objects. -- Daniel Stutzbach

Guido van Rossum

3:30 a.m.

On Tue, Jan 4, 2011 at 2:15 PM, Daniel Stutzbach <stutzbach@google.com> wrote:

...

On Tue, Jan 4, 2011 at 9:33 AM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:

...
The keys are immutable anyway so the instances of PyDict could manage a opaque value (in fact, a counter) that changes every time a new value is written to any key. Once we get a reference out of the dict, we can can do very fast lookups by passing the key, the reference we know from the last lookup and our last state. The lookup returns a new reference and the new state. If the dict has not changed, the state doesnt change and the reference is simply taken from the passed value passed to the lookup. That way the code remains the same no matter if the dict has changed or not.

I have had similar ideas in the past but have never found time to explore them. The same mechanism could also be used to speed up attribute access on objects.

Check out the various approaches in PEP 266, PEP 267, and PEP 280. -- --Guido van Rossum (python.org/~guido)

Neil Schemenauer

4 Jan 4 Jan

10:51 p.m.

Barry Warsaw <barry@python.org> wrote:

...

On Jan 04, 2011, at 10:21 AM, Alex Gaynor wrote:

...
Ugh, I can't be the only one who finds these special cases to be a little nasty?

Special cases aren't special enough to break the rules.

Yeah, I agree. Still it would be interesting to see what kind of performance improvement this would result in. That seems to be the only way to decide whether the cost is worth the benefit.

Yuck from me as well. I would guess that attribute lookups would be just as significant as global variable lookups (depending on coding style, of course). In contrast, the local variable semantic change provided a big speed increase for a minor language complexity cost. Neil

Steven D'Aprano

5 Jan 5 Jan

2:50 a.m.

Guido van Rossum wrote:

...

On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this?

I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved. I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one. -- Steven

Lukas Lueg

3:13 a.m.

I very much like the fact that python has *very* little black magic revealed to the user. Strong -1 on optimizing picked builtins in a picked way. 2011/1/4 Steven D'Aprano <steve@pearwood.info>:

...

Guido van Rossum wrote:

...
On Tue, Jan 4, 2011 at 2:49 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...
I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

Hm, I already suggested to make an exception for open, (and one should be added for __import__) but if this is done for other builtins that is indeed a problem. Can you point to example code doing this?

I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved.

I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one.

-- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/lukas.lueg%40gmail.com

Guido van Rossum

4:30 a.m.

On Tue, Jan 4, 2011 at 2:13 PM, Lukas Lueg <lukas.lueg@googlemail.com> wrote:

...

I very much like the fact that python has *very* little black magic revealed to the user. Strong -1 on optimizing picked builtins in a picked way.

That's easy for you to say. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum

3:15 a.m.

On Tue, Jan 4, 2011 at 1:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved.

But why couldn't you edit the source code?

...

I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one.

As long as you have a definition or import of sum at the top of (or really anywhere in) the module, that will still work. It's only if you were to do things like import builtins builtins.len = ... (whether inside your package or elsewhere) that things would stop working with the proposed optimization. -- --Guido van Rossum (python.org/~guido)

Steven D'Aprano

4:13 a.m.

Guido van Rossum wrote:

...

On Tue, Jan 4, 2011 at 1:50 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
I've been known to monkey-patch builtins in the interactive interpreter and in test code. One example that comes to mind is that I had some over-complicated recursive while loop (!), and I wanted to work out the Big Oh behaviour so I knew exactly how horrible it was. Working it out from first principles was too hard, so I cheated: I knew each iteration called len() exactly once, so I monkey-patched len() to count how many times it was called. Problem solved.

But why couldn't you edit the source code?

Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len. Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise.

...

...
I also have a statistics package that has its own version of sum, and I rely on calls to sum() from within the package picking up my version rather than the builtin one.

As long as you have a definition or import of sum at the top of (or really anywhere in) the module, that will still work. It's only if you were to do things like

import builtins builtins.len = ...

(whether inside your package or elsewhere) that things would stop working with the proposed optimization.

Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely? Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking. The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing: mymodule.len = mylen But I haven't done much work with mocks, so I'm just guessing. -- Steven

Guido van Rossum

4:29 a.m.

On Tue, Jan 4, 2011 at 3:13 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Guido van Rossum wrote:

...
But why couldn't you edit the source code?

Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len.

The interactive interpreter will always be excluded from this kind of optimization (well, in my proposal anyway).

...

Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise.

There are two versions of the "assign to global named 'len'" idiom. One is benign: if the assignment occurs in the source code of the module (i.e., where the compiler can see it when it is compiling the module) the optimization will be disabled in that module. The second is not: if a module-global named 'len' is set in a module from outside that module, the compiler cannot see that assignment when it considers the optimization, and it may generate optimized code that will not take the global by that name into account (it would use an opcode that computes the length of an object directly). The third way to mess with the optimization is messing with builtins.len. This one is also outside what the compiler can see. [...]

...

Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely?

Monkey-patching intentionally has a bad name -- there's always a code smell. (And it looks like one, too. :-)

...

Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking.

I think the folks referring to monkey-patching builtins in unittests were referring to this. But they might also be referring to the second option above.

...

The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing:

mymodule.len = mylen

But I haven't done much work with mocks, so I'm just guessing.

Same here. But it would fail (i.e. not be picked up by the optimization) either way. -- --Guido van Rossum (python.org/~guido)

Michael Foord

4:32 a.m.

On 04/01/2011 23:29, Guido van Rossum wrote:

...

On Tue, Jan 4, 2011 at 3:13 PM, Steven D'Aprano<steve@pearwood.info> wrote:

...
Guido van Rossum wrote:

...
But why couldn't you edit the source code? Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len. The interactive interpreter will always be excluded from this kind of optimization (well, in my proposal anyway).

...
Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise. There are two versions of the "assign to global named 'len'" idiom.

One is benign: if the assignment occurs in the source code of the module (i.e., where the compiler can see it when it is compiling the module) the optimization will be disabled in that module.

The second is not: if a module-global named 'len' is set in a module from outside that module, the compiler cannot see that assignment when it considers the optimization, and it may generate optimized code that will not take the global by that name into account (it would use an opcode that computes the length of an object directly).

The third way to mess with the optimization is messing with builtins.len. This one is also outside what the compiler can see.

[...]

...
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely? Monkey-patching intentionally has a bad name -- there's always a code smell. (And it looks like one, too. :-)

...
Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking. I think the folks referring to monkey-patching builtins in unittests were referring to this. But they might also be referring to the second option above.

I prefer monkey patching builtins (where I do such a thing) in the namespace where they are used. I know it is *common* to monkeypatch __builtins__.open (python 2) however. I don't recall monkey patching anything other than open and raw_input myself but I *bet* there are people doing it for reasons they see as legitimate in tests. :-) Michael

...

...
The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing:

mymodule.len = mylen

But I haven't done much work with mocks, so I'm just guessing. Same here. But it would fail (i.e. not be picked up by the optimization) either way.

Michael Foord

4:36 a.m.

On 04/01/2011 23:29, Guido van Rossum wrote:

...

On Tue, Jan 4, 2011 at 3:13 PM, Steven D'Aprano<steve@pearwood.info> wrote:

...
Guido van Rossum wrote:

...
But why couldn't you edit the source code? Because there was no source code -- I was experimenting in the interactive interpreter. I could have just re-created the function by using the readline history, but it was just easier to redefine len. The interactive interpreter will always be excluded from this kind of optimization (well, in my proposal anyway).

...
Oh... it's just occurred to me that you were asking for use-cases for assigning to builtins.len directly, rather than just to len. No, I've never done that -- sorry for the noise. There are two versions of the "assign to global named 'len'" idiom.

One is benign: if the assignment occurs in the source code of the module (i.e., where the compiler can see it when it is compiling the module) the optimization will be disabled in that module.

The second is not: if a module-global named 'len' is set in a module from outside that module, the compiler cannot see that assignment when it considers the optimization, and it may generate optimized code that will not take the global by that name into account (it would use an opcode that computes the length of an object directly).

The third way to mess with the optimization is messing with builtins.len. This one is also outside what the compiler can see.

[...]

...
Ha, well, that's the sort of thing that gives monkey-patching a bad name, surely? Monkey-patching intentionally has a bad name -- there's always a code smell. (And it looks like one, too. :-)

...
Is there a use-case for globally replacing builtins for all modules, everywhere? I suppose that's what you're asking. I think the folks referring to monkey-patching builtins in unittests were referring to this. But they might also be referring to the second option above.

The only examples I could find from a quick search (using the patch decorator from my mock module which is reasonably well used) patch __builtins__.open and raw_input. https://www.google.com/codesearch?hl=en&lr=&q=%22patch%28%27__builtin__.%22+lang%3Apython+case%3Ayes :-) Michael Foord

...

...
The only example I can think of might be the use of mocks for testing purposes, but even there I'd prefer to inject the mock into the module I was testing:

mymodule.len = mylen

But I haven't done much work with mocks, so I'm just guessing. Same here. But it would fail (i.e. not be picked up by the optimization) either way.

Guido van Rossum

4:39 a.m.

On Tue, Jan 4, 2011 at 3:36 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...

The only examples I could find from a quick search (using the patch decorator from my mock module which is reasonably well used) patch __builtins__.open and raw_input.

https://www.google.com/codesearch?hl=en&lr=&q=%22patch%28%27__builtin__.%22+lang%3Apython+case%3Ayes

So, that significantly weakens the argument that this optimization will break unit tests, since I am happy to promise never to optimize these builtins, and any other builtins intended for I/O. Surely it will break *somebody's* code. That hasn't stopped us with other changes. The crux is whether it breaks significant amounts of code or code that would be really hard to write in another way. -- --Guido van Rossum (python.org/~guido)

Terry Reedy

1:56 p.m.

On 1/4/2011 6:39 PM, Guido van Rossum wrote:

...

So, that significantly weakens the argument that this optimization will break unit tests, since I am happy to promise never to optimize these builtins, and any other builtins intended for I/O.

This is one comprehensible rule rather than a list of exceptions, so easier to remember. It has two rationales: such often need to be over-riden for testing, possibly in hidden ways; such are inherently 'slow' so optimizing dict lookup away hardly makes sense. -- Terry Jan Reedy

Maciej Fijalkowski

6:27 p.m.

How about not changing semantics and still making this optimization possible? PyPy already has CALL_LIKELY_BUILTIN which checks whether builtins has been altered (by keeping a flag on the module dictionary) and if not, loads a specific builtin on top of value stack. From my current experience, I would make a bet that someone is altering pretty much every builtin for some dark reasons. One that comes to mind is to test something using external library which is not playing along too well. That however only lets one avoid dictionary lookups, it doesn't give potential for other optimizations (which in my opinion are limited until we hit something dynamic like an instance, but let's ignore it). How about creating two copies of bytecode (that's not arbitrary number, just 2) and a way to go from more optimized to less optimized in case *any* of promises is invalidated? That gives an ability to save semantics, while allowing optimizations. That said, I think CPython should stay as a simple VM and refrain from doing things that are much easier in the presence of a JIT (and give a lot more speedups), but who am I to judge. Cheers, fijal

Guido van Rossum

11:35 p.m.

On Wed, Jan 5, 2011 at 5:27 AM, Maciej Fijalkowski <fijall@gmail.com> wrote:

...

How about not changing semantics and still making this optimization possible?

PyPy already has CALL_LIKELY_BUILTIN which checks whether builtins has been altered (by keeping a flag on the module dictionary) and if not, loads a specific builtin on top of value stack.

I can only repeat what I said before. That's what everybody proposes, and if you have the infrastructure, it's a fine solution. But to me, those semantics aren't sacred, and I want to at least explore an alternative. Putting a hook on two dicts (the module globals and builtins.__dict__) is a lot of work in CPython, and has the risk of slowing everything down (just a tad, but still -- AFAIK dicts currently are not hookable). Checking whether there's a global named 'len' is much simpler in the current CPython compiler. -- --Guido van Rossum (python.org/~guido)

skip＠pobox.com

5:10 a.m.

New subject: Steroidal builtins (was: Possible optimization for LOAD_FAST ?)

Steven> I've been known to monkey-patch builtins in the interactive Steven> interpreter and in test code. Me too. I use a slightly beefed up dir() funcion which identifies modules within a package which haven't been imported yet. Handy for quick-n-dirty introspection. >>> import email >>> dir(email) ['Charset', 'Encoders', 'Errors', 'FeedParser', 'Generator', 'Header', 'Iterators', 'LazyImporter', 'MIMEAudio', 'MIMEBase', 'MIMEImage', 'MIMEMessage', 'MIMEMultipart', 'MIMENonMultipart', 'MIMEText', 'Message', 'Parser', 'Utils', '[_parseaddr]', '[base64mime]', '[charset]', '[encoders]', '[errors]', '[feedparser]', '[generator]', '[header]', '[iterators]', '[message]', '[parser]', '[quoprimime]', '[test/]', '[utils]', '_LOWERNAMES', '_MIMENAMES', '__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__', '__version__', '_name', 'base64MIME', 'email', 'importer', 'message_from_file', 'message_from_string', 'mime', 'quopriMIME', 'sys'] Those names with [...] bracketing the names are submodules or subpackages of the email package which haven't been imported yet. Skip

Nick Coghlan

4 Jan 4 Jan

9:03 p.m.

On Tue, Jan 4, 2011 at 8:49 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:

...

I think someone else pointed this out, but replacing builtins externally to a module is actually common for testing. In particular replacing the open function, but also other builtins, is often done temporarily to replace it with a mock. It seems like this optimisation would break those tests.

print() and input() come to mind. However, so long as appropriate tools are provided to recompile a module with this optimisation disabled for certain builtins (with some builtins such as open(), print() and input() blacklisted by default) then that issue should be manageable. I've extracted the full list of 68 builtin functions from the table in the docs below. I've placed asterisks next to the ones I think we would *want* to be eligible for optimisation. Aside from the 3 mentioned above, we could fairly easily omit ones which are used primarily at the interactive prompt (i.e. dir(), help()), as well as __import__() (since that lookup is handled specially anyway). Cheers, Nick. __import__() abs() * all() * any() * ascii() * bin() * bool() * bytearray() * bytes() * callable() * chr() * classmethod() * compile() * complex() * delattr() * dict() * dir() divmod() * enumerate() * eval() * exec() * filter() * float() * format() * frozenset() * getattr() * globals() * hasattr() * hash() * help() hex() * id() * input() int() * isinstance() * issubclass() * iter() * len() * list() * locals() * map() * max() * memoryview() * min() * next() * object() * oct() * open() ord() * pow() * print() property() * range() * repr() * reversed() * round() * set() * setattr() * slice() * sorted() * staticmethod() * str() * sum() * super() * tuple() * type() * vars() * zip() * -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Cesare Di Mauro

3 Jan 3 Jan

11:52 a.m.

2011/1/3 Alex Gaynor <alex.gaynor@gmail.com>

...

No, it's singularly impossible to prove that any global load will be any given value at compile time. Any optimization based on this premise is wrong.

Alex

That's your opinion, but I have very different ideas. Of course we can't leave the problem only on the compiler shoulders, but I think that can be ways to threat builtins as "static" variables, and globals like local (fast) variables too, taking into account changes on the builtins' and modules dictionaries. But it doesn't make sense to invest time in these things: JITs are becoming a good alternative, and may be they will be ready soon to take the CPython place as the "mainstream" implementation. Cesare

5069

Age (days ago)

5077

Last active (days ago)

List overview

Download

51 comments

22 participants

participants (22)

"Martin v. Löwis"
Alex Gaynor
Barry Warsaw
Benjamin Peterson
Cesare Di Mauro
Daniel Stutzbach
David Malcolm
Ethan Furman
Georg Brandl
Glyph Lefkowitz
Guido van Rossum
Lukas Lueg
Maciej Fijalkowski
Michael Foord
Ned Batchelder
Neil Schemenauer
Nick Coghlan
Reid Kleckner
skip＠pobox.com
Steven D'Aprano
Terry Reedy
Victor Stinner

Possible optimization for LOAD_FAST ?

tags

participants (22)