Release of astoptimizer 0.3
Hi, Here are some progress on my astoptimizer project. If you are interested by the optimizer, run it on your own project or help me to implement more optimizations. http://pypi.python.org/pypi/astoptimizer https://bitbucket.org/haypo/astoptimizer --- The last version (0.3) works on Python 2.6-3.3 and implements the following optimizations: * Call builtin functions if arguments are constants. Examples: - len("abc") => 3 - ord("A") => 65 * Call methods of builtin types if the object and arguments are constants. Examples: - u"h\\xe9ho".encode("utf-8") => b"h\\xc3\\xa9ho" - "python2.7".startswith("python") => True - (32).bit_length() => 6 - float.fromhex("0x1.8p+0") => 1.5 * Call functions of math and string modules for functions without border effect. Examples: - math.log(32) / math.log(2) => 5.0 - string.atoi("5") => 5 * Format strings for str%args and print(arg1, arg2, ...) if arguments are constants and the format string is valid. Examples: - "x=%s" % 5 => "x=5" - print(1.5) => print("1.5") * Simplify expressions. Examples: - not(x in y) => x not in y - 4 and 5 and x and 6 => x and 6 * Loop: replace range() with xrange() on Python 2, and list with tuple. Examples: - for x in range(n): ... => for x in xrange(n): ... - for x in [1, 2, 3]: ... => for x in (1, 2, 3): ... * Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples: - 1 + 2 * 3 => 7 - not True => False - "abc" * 3 => "abcabcabc" - abcdef[:3] => abc - (2, 7, 3)[1] => 7 - frozenset("ab") | frozenset("bc") => frozenset("abc") - None is None => True - "2" in "python2.7" => True - "def f(): return 2 if 4 < 5 else 3" => "def f(): return 2" * Remove dead code. Examples: - def f(): return 1; return 2 => def f(): return 1 - if DEBUG: print("debug") => pass with DEBUG declared as False - while 0: ... => pass --- Unsafe optimizations are disabled by default. Optimizations can be enabled using a Config class with "features" like "builtin_funcs" (builtin functions like len()) or "pythonbin" (optimized code will be execute by the same Python binary executable). astoptimizer.patch_compile() can be used to hook the optimizer in the compile() builtin function. On Python 3.3, it is enough to use the optimizer on imports (thanks to the importlib). On older versions, the compileall module can be used to compile a whole project using the optimizer. I didn't start to benchmark anything yet, I focused on fixing bugs (not generating invalid code). I will start benchmarks when the "variables" feature (ex: "x=1; print(x)" => "x=1; print(1)") will work. There is an experimental support of variables, but it is still too agressive and generate invalid code in some cases (see the TODO file). I plan to implement other optimizations like unrolling loop or convert a loop to a list comprehension, see the TODO file. Don't hesitate to propose more optimizations if you have some ideas ;-) Victor
Am 11.09.2012 12:41, schrieb Victor Stinner:
Hi,
Here are some progress on my astoptimizer project. If you are interested by the optimizer, run it on your own project or help me to implement more optimizations.
Wow, that's an amazing list of optimizations. Keep up the good work! Christian
On Tue, Sep 11, 2012 at 12:41 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
Here are some progress on my astoptimizer project. If you are interested by the optimizer, run it on your own project or help me to implement more optimizations.
http://pypi.python.org/pypi/astoptimizer https://bitbucket.org/haypo/astoptimizer
---
The last version (0.3) works on Python 2.6-3.3 and implements the following optimizations:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does it preserve python semantics? What if you change the len builtin?
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does it preserve python semantics? What if you change the len builtin?
This optimization is disabled by default (in the version 0.3), because builtin functions may be shadowed. Examples: "len=ord; print(len('A'))" or exec(compile("print(len('A'))", "test", "exec"), {'len': ord}). If you know that one specific builtin function is shadowed (ex: print), you can modify the configuration to enable optimizations on builtin functions except the specified function. Do you projects where builtin functions are shadowed? The idea is that the author of the application knows its application (... and all modules used by applications) and is able to explicitly specify what is that "constant". You can for example declare some variables as constant using Config.add_constant("module.name", value). In the same manner, you can declare "pure" functions (without border effect). Victor
On 11/09/2012 13:06, Victor Stinner wrote:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does it preserve python semantics? What if you change the len builtin?
This optimization is disabled by default (in the version 0.3), because builtin functions may be shadowed. Examples: "len=ord; print(len('A'))" or exec(compile("print(len('A'))", "test", "exec"), {'len': ord}).
If you know that one specific builtin function is shadowed (ex: print), you can modify the configuration to enable optimizations on builtin functions except the specified function.
Do you projects where builtin functions are shadowed?
The idea is that the author of the application knows its application (... and all modules used by applications) and is able to explicitly specify what is that "constant". You can for example declare some variables as constant using Config.add_constant("module.name", value). In the same manner, you can declare "pure" functions (without border effect).
Do you know what the cost would be of, say, replacing: len("abc") with: 3 if len is __builtins__.len else len("abc") if possible where the lookup __builtins__.len is been done early, such as at compile time?
On Tue, Sep 11, 2012 at 10:06 AM, MRAB <python@mrabarnett.plus.com> wrote:
On 11/09/2012 13:06, Victor Stinner wrote:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does it preserve python semantics? What if you change the len builtin?
This optimization is disabled by default (in the version 0.3), because builtin functions may be shadowed. Examples: "len=ord; print(len('A'))" or exec(compile("print(len('A'))", "test", "exec"), {'len': ord}).
If you know that one specific builtin function is shadowed (ex: print), you can modify the configuration to enable optimizations on builtin functions except the specified function.
Do you projects where builtin functions are shadowed?
The idea is that the author of the application knows its application (... and all modules used by applications) and is able to explicitly specify what is that "constant". You can for example declare some variables as constant using Config.add_constant("module.name", value). In the same manner, you can declare "pure" functions (without border effect).
Do you know what the cost would be of, say, replacing:
len("abc")
with:
3 if len is __builtins__.len else len("abc")
if possible where the lookup __builtins__.len is been done early, such as at compile time?
Doing the lookup at compile time would defeat the whole purpose (the dynamic shadowing of len could happen at any time). Also there are two forms of shadowing: patching the globals of the module where the call occurs, or patching the builtins. Doing the lookup at run time would also defeat the purpose, because most of the cost of calling len() *is* the lookup. Actually two lookups -- the LOAD_GLOBAL opcode is used, which first looks in the globals (this will normally always fail) and then in the builtins. FWIW, I expect that there are few places where len(<constant string>) is actually used. However I expect it to be quite common for ord(), where the same reasoning applies. -- --Guido van Rossum (python.org/~guido)
2012/9/11 Guido van Rossum <guido@python.org>:
FWIW, I expect that there are few places where len(<constant string>) is actually used.
I found one revelant example in the stdlib. if path.endswith('.dylib'): yield path[:-len('.dylib')] + suffix + '.dylib' else: yield path + suffix Cool, I'm not the only developer too lazy to copy/paste len('suffix') in a Python interpreter and then copy/paste the result in my code :-) "if text.endswith('suffix'): ... text[:-len('suffix')] ..." is a common pattern in my code.
However I expect it to be quite common for ord(), where the same reasoning applies.
Projects using the same code base for Python 2 and Python 3 contain a lot of inefficient code. For example, using the six library, a simple Unicode literal strings becomes a function code: u('unicode'). I expect that astoptimizer will be able to remove (or at least reduce!) the overhead of the six library and all checks on the Python version ("if PYTHON3: ... else: ..."). Victor
On Tue, Sep 11, 2012 at 5:40 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
2012/9/11 Guido van Rossum <guido@python.org>:
FWIW, I expect that there are few places where len(<constant string>) is actually used.
I found one revelant example in the stdlib.
if path.endswith('.dylib'): yield path[:-len('.dylib')] + suffix + '.dylib' else: yield path + suffix
Cool, I'm not the only developer too lazy to copy/paste len('suffix') in a Python interpreter and then copy/paste the result in my code :-)
"if text.endswith('suffix'): ... text[:-len('suffix')] ..." is a common pattern in my code.
Ok, that's a good use case.
However I expect it to be quite common for ord(), where the same reasoning applies.
Projects using the same code base for Python 2 and Python 3 contain a lot of inefficient code. For example, using the six library, a simple Unicode literal strings becomes a function code: u('unicode').
But are you able to do enough static analysis to feel comfortable that this is the right u() function? IIRC you said earlier that you're not even capable of recognizing "len = ord; print(len('a'))" -- if that is really true, I'm very worried about your optimizer's capacity for breaking code. I'm not talking about "behind-their-back" changes to __builtins__ or patching of the module globals. I'm talking about detecting straightforward definitions that override the identifiers you are replacing.
I expect that astoptimizer will be able to remove (or at least reduce!) the overhead of the six library and all checks on the Python version ("if PYTHON3: ... else: ...").
Hm. Wouldn't it be just as easy to run a source-to-source translator to remove six artefacts instead of an ast optimizer? Surely some convention could be adopted that is easy to use, and the tool to do the translation could be a lot simpler than an ast optimizer. Sorry for being skeptical, but I'm not excited about advertising this as a general optimization tool unless you can make it a lot safer. -- --Guido van Rossum (python.org/~guido)
Projects using the same code base for Python 2 and Python 3 contain a lot of inefficient code. For example, using the six library, a simple Unicode literal strings becomes a function code: u('unicode').
But are you able to do enough static analysis to feel comfortable that this is the right u() function? IIRC you said earlier that you're not even capable of recognizing "len = ord; print(len('a'))" -- if that is really true, I'm very worried about your optimizer's capacity for breaking code. I'm not talking about "behind-their-back" changes to __builtins__ or patching of the module globals. I'm talking about detecting straightforward definitions that override the identifiers you are replacing.
astoptimizer is still experimental, but I prefer to release early and release often, because I already get interesting feedback. "from math import pow as len" is already supported in the version 0.3, and "len=ord" was fixed in the version 0.3.1. astoptimizer is supposed to handle any instruction setting variables (if I forgot to handle a specific instruction, it's a bug). My initial goal was to optimize "x=1; return x", but then I realized that I must handle take care of all variables, because they may shadow builtin functions or constants. The following AST statements creates a "new" namespace (inherit some properties from the parent): - Module - FunctionDef - ClassDef The following AST statements set variables or have an impact on scope: - Assign, AugAssign, Del - Global, Nonlocal - Import, ImportFrom - With - arguments (of FunctionDef) - comprehension (of ListComp or GeneratorExp) There is an experimental support for assignments. If an unsupported assignment is found (ex: ((x, y), z) = x_y_z), all optimizations on names (ex: len("abc") or math.e) are disabled. For example, "from re import *" disables optimizations (but "from math import *" is supported).
I expect that astoptimizer will be able to remove (or at least reduce!) the overhead of the six library and all checks on the Python version ("if PYTHON3: ... else: ...").
Hm. Wouldn't it be just as easy to run a source-to-source translator to remove six artefacts instead of an ast optimizer?
You mean something like 2to3? I understood that the six module is written for developers who prefer to use the same code base for Python 2 and Python 3. With Python 3.3, if astoptimizer hooks the compile() builtin, the optimization is enabled transparently when importing a module (during .py => .pyc compiling). There is no need to explicitly "compile" an application.
Surely some convention could be adopted that is easy to use, and the tool to do the translation could be a lot simpler than an ast optimizer.
I like AST because I don't need to write my own parser.
Sorry for being skeptical, but I'm not excited about advertising this as a general optimization tool unless you can make it a lot safer.
It is safer at every release. I will use the version number 1.0 when the optimizer will be completly safe :-) (So it's only 31% safe yet!) Victor
On Tue, Sep 11, 2012 at 8:41 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
This is fine in an external project, but should never be added to the standard library. The barrier to semantic changes that break monkeypatching should be high. Yes, this is frustrating as it eliminates a great many interesting static optimisations that are *probably* OK. That's one of the reasons why PyPy uses tracing - it can perform these optimisations *and* still include the appropriate dynamic checks. However, the double barrier of third party module + off by default is a suitable activation barrier for ensuring people know that what they're doing is producing bytecode that doesn't behave like standard Python any more (e.g. tests won't be able to shadow builtins or optimised module references). Optimisations that break the language semantics are heading towards the same territory as the byteplay and withhacks modules (albeit not as evil internally).
* Call methods of builtin types if the object and arguments are constants. Examples:
- u"h\\xe9ho".encode("utf-8") => b"h\\xc3\\xa9ho" - "python2.7".startswith("python") => True - (32).bit_length() => 6 - float.fromhex("0x1.8p+0") => 1.5
That last one isn't constant, it's a name lookup. Very cool optimisations for literals, though.
* Call functions of math and string modules for functions without border effect. Examples:
- math.log(32) / math.log(2) => 5.0 - string.atoi("5") => 5
Same comment applies here as for the builtin optimisation: fine in an external project, not in the standard library (even if it's off by default - merely having it there is still an official endorsement of deliberately breaking the dynamic lookup semantics of our own language).
* Format strings for str%args and print(arg1, arg2, ...) if arguments are constants and the format string is valid. Examples:
- "x=%s" % 5 => "x=5" - print(1.5) => print("1.5")
The print example runs afoul of the general rule above: not in the standard library, because you're changing the values seen by a mocked version of print()
* Simplify expressions. Examples:
- not(x in y) => x not in y
This (and the "is") equivalent should be OK
- 4 and 5 and x and 6 => x and 6
So long as this is just constant folding, that should be fine, too.
* Loop: replace range() with xrange() on Python 2, and list with tuple. Examples:
- for x in range(n): ... => for x in xrange(n): ... - for x in [1, 2, 3]: ... => for x in (1, 2, 3): ...
Name lookup optimisations again: not in the standard library.
* Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples:
- 1 + 2 * 3 => 7 - not True => False - "abc" * 3 => "abcabcabc" - abcdef[:3] => abc - (2, 7, 3)[1] => 7 - frozenset("ab") | frozenset("bc") => frozenset("abc") - None is None => True - "2" in "python2.7" => True - "def f(): return 2 if 4 < 5 else 3" => "def f(): return 2"
Yep, literals are good.
* Remove dead code. Examples:
- def f(): return 1; return 2 => def f(): return 1 - if DEBUG: print("debug") => pass with DEBUG declared as False - while 0: ... => pass
Dangerous. def f(): return 1; yield if DEBUG: yield while 0: yield
def f(): ... if 0: ... global x ... return x ... f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in f NameError: global name 'x' is not defined
Unsafe optimizations are disabled by default. Optimizations can be enabled using a Config class with "features" like "builtin_funcs" (builtin functions like len()) or "pythonbin" (optimized code will be execute by the same Python binary executable).
astoptimizer.patch_compile() can be used to hook the optimizer in the compile() builtin function. On Python 3.3, it is enough to use the optimizer on imports (thanks to the importlib). On older versions, the compileall module can be used to compile a whole project using the optimizer.
I didn't start to benchmark anything yet, I focused on fixing bugs (not generating invalid code). I will start benchmarks when the "variables" feature (ex: "x=1; print(x)" => "x=1; print(1)") will work. There is an experimental support of variables, but it is still too agressive and generate invalid code in some cases (see the TODO file).
I plan to implement other optimizations like unrolling loop or convert a loop to a list comprehension, see the TODO file.
Don't hesitate to propose more optimizations if you have some ideas ;-)
Mainly just a request to be *very*, *very* clear that the unsafe optimisations will produce bytecode that *does not behave like Python* with respect to name lookup semantics, thus mock based testing that relies on name shadowing will not work correctly, and neither will direct monkeypatching. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 11.09.2012 14:57:
On Tue, Sep 11, 2012 at 8:41 PM, Victor Stinner wrote:
* Loop: replace range() with xrange() on Python 2, and list with tuple. Examples:
- for x in range(n): ... => for x in xrange(n): ... - for x in [1, 2, 3]: ... => for x in (1, 2, 3): ...
Name lookup optimisations again: not in the standard library.
I assume you meant the "range" part, not the second example (which will end up caching a constant tuple).
* Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples:
- 1 + 2 * 3 => 7 - not True => False - "abc" * 3 => "abcabcabc" - abcdef[:3] => abc - (2, 7, 3)[1] => 7 - frozenset("ab") | frozenset("bc") => frozenset("abc")
That's a name lookup, too.
- None is None => True - "2" in "python2.7" => True - "def f(): return 2 if 4 < 5 else 3" => "def f(): return 2"
Yep, literals are good.
Except that evaluating something like '"abc" * constant' can eat up all memory, imagine this code: KILL_MEMORY = sys.argv[1] == 'NEVER PASS THIS VALUE' def test(): if KILL_MEMORY: return 'abc' * 10000000000000000000000000000 else: return 'abc' Stefan
- frozenset("ab") | frozenset("bc") => frozenset("abc")
That's a name lookup, too.
Yes, and it is only optimized if the "builtin_funcs" feature is enabled explictly.
Except that evaluating something like '"abc" * constant' can eat up all memory, imagine this code:
KILL_MEMORY = sys.argv[1] == 'NEVER PASS THIS VALUE'
def test(): if KILL_MEMORY: return 'abc' * 10000000000000000000000000000 else: return 'abc'
The optimizer always check the result of an optimization for constraints. For example, an optimization is cancelled if the result is not immutable. Integers must be in the range [-2^128; 2^128-1] by default. Strings are limited to 4096 bytes/characters. Tuples are limited to 20 items. The compiler may waste some seconds to evaluate an expression and then drop the result, but I don't really care right now. I prefer to spend minutes to compile if the application is faster at runtime. astoptimizer tries to avoid expensive operations when it knows that the result will be too big. For your specific example, str * int is *not* evulated if the result will be longer than the limit. Similar sanity checks may be added later for pow(int, int) and math.factorial(int) for example. Victor
2012/9/11 Nick Coghlan <ncoghlan@gmail.com>:
This is fine in an external project, but should never be added to the standard library. The barrier to semantic changes that break monkeypatching should be high.
The version 0.3 has a known bug: "len=chr; print(len('A'))" is optimized, whereas it should not. It is now fixed in the new version 0.3.1.
* Call methods of builtin types if the object and arguments are constants. Examples:
- u"h\\xe9ho".encode("utf-8") => b"h\\xc3\\xa9ho" - "python2.7".startswith("python") => True - (32).bit_length() => 6 - float.fromhex("0x1.8p+0") => 1.5
That last one isn't constant, it's a name lookup.
Well, yes, but in CPython, it is not possible to modify float.fromhex attribute, nor unicode.encode.
- print(1.5) => print("1.5")
The print example runs afoul of the general rule above: not in the standard library, because you're changing the values seen by a mocked version of print()
Ah yes, you found a bug. I forgot to disable this optimization by default (when builtin_funcs feature is disabled). Fixed in 0.3.1.
- not(x in y) => x not in y
This (and the "is") equivalent should be OK
Note: not(x is y) is also implemented.
* Loop: replace range() with xrange() on Python 2, and list with tuple. Examples:
- for x in range(n): ... => for x in xrange(n): ... - for x in [1, 2, 3]: ... => for x in (1, 2, 3): ...
Name lookup optimisations again: not in the standard library.
Correct, same issue than print(): forgot in version 0.3, and also fixed in 0.3.1.
def f(): return 1; yield if DEBUG: yield while 0: yield
I'm using the test suite of Python 2.7 and 3.3 using my optimizer. "if 0: yield" is a known captcha (there is a test) and it is handled correctly by astoptimizer. By the way, "return 3" is not removed in a generator because it must raise a SyntaxError.
def f(): ... if 0: ... global x ... return x ... f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in f NameError: global name 'x' is not defined
Oh, nice catch. But it is possible to detect the usage of global and nonlocal, and don't remove an expression if they are present. Fixed in 0.3.1. Victor
On 11/09/2012 22:46, Victor Stinner wrote:
2012/9/11 Nick Coghlan <ncoghlan@gmail.com>:
This is fine in an external project, but should never be added to the standard library. The barrier to semantic changes that break monkeypatching should be high.
The version 0.3 has a known bug: "len=chr; print(len('A'))" is optimized, whereas it should not. It is now fixed in the new version 0.3.1.
* Call methods of builtin types if the object and arguments are constants. Examples:
- u"h\\xe9ho".encode("utf-8") => b"h\\xc3\\xa9ho" - "python2.7".startswith("python") => True - (32).bit_length() => 6 - float.fromhex("0x1.8p+0") => 1.5
That last one isn't constant, it's a name lookup.
Well, yes, but in CPython, it is not possible to modify float.fromhex attribute, nor unicode.encode.
It's possible to shadow 'float':
float = "foo" float 'foo'
On Tue, Sep 11, 2012 at 2:57 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Tue, Sep 11, 2012 at 8:41 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
This is fine in an external project, but should never be added to the standard library. The barrier to semantic changes that break monkeypatching should be high.
Yes, this is frustrating as it eliminates a great many interesting static optimisations that are *probably* OK. That's one of the reasons why PyPy uses tracing - it can perform these optimisations *and* still include the appropriate dynamic checks.
However, the double barrier of third party module + off by default is a suitable activation barrier for ensuring people know that what they're doing is producing bytecode that doesn't behave like standard Python any more (e.g. tests won't be able to shadow builtins or optimised module references). Optimisations that break the language semantics are heading towards the same territory as the byteplay and withhacks modules (albeit not as evil internally).
The third (and the most important) barrier is that constant folding len("abc") is essentially useless. You can do some optimizations that are sound, at probably a great deal of complexity, if you maintain all the places that constant folded stuff and change them if you shadow a builtin. This approach is done in PyPy for example in a more systematic way (by invalidating the assembler). Anyway, since this is, in it's current shape clearly not designed to preserve the semantics of python at all, is the discussion of this package on-topic for python-dev? More so than say discussing Cython or Numba or any other kind-of-python-but-not-quite project? Cheers, fijal
On 11.09.12 13:41, Victor Stinner wrote:
Here are some progress on my astoptimizer project. If you are interested by the optimizer, run it on your own project or help me to implement more optimizations.
http://pypi.python.org/pypi/astoptimizer https://bitbucket.org/haypo/astoptimizer
It's a very interesting work.
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does this mean transformation print("A") => None and output at compile time?
* Loop: replace range() with xrange() on Python 2,
range(0, 10**10, 10**9)
* Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples:
- 1 + 2 * 3 => 7
Does this mean 1/0 raises exception at compile time?
* Remove dead code. Examples:
- def f(): return 1; return 2 => def f(): return 1 - if DEBUG: print("debug") => pass with DEBUG declared as False - while 0: ... => pass
As Nick said, it could change the semantics, if there were local variable assignment or yield in removed code.
Unsafe optimizations are disabled by default. Optimizations can be enabled using a Config class with "features" like "builtin_funcs" (builtin functions like len()) or "pythonbin" (optimized code will be execute by the same Python binary executable).
It would be good if the optimizer checked if the built-in names overridden in this module or function and disabled this dangerous optimization in such case.
I plan to implement other optimizations like unrolling loop or convert a loop to a list comprehension, see the TODO file.
Don't hesitate to propose more optimizations if you have some ideas ;-)
set([1, 2, 3]) => {1, 2, 3} set([x for ...]) => {x for ...} dict([(k, v) for ...]) => {k: v for ...} dict((k, v) for ...) => {k: v for ...} ''.join([s for ...]) => ''.join(s for ...) a.extend([s for ...]) => a.extend(s for ...) (f(x) for x in a) => map(f, a) (x.y for x in a) => map(operator.attrgetter('y'), a) (x[0] for x in a) => map(operator.itemgetter(0), a) (2 * x for x in a) => map((2).__mul__, a) (x in b for x in a) => map(b.__contains__, a) map(lambda x: x.strip(), a) => (x.strip() for x in a) x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'} x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'} a = []; for ...: a.append(x) => a = [x for ...] a = (); for ...: a.add(x) => a = {x for ...} a = {}; for ...: a[k] = v => a = {k: v for ...} for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...) x = x + 1 => x += 1 x = x + ' ' => x += ' ' x = x + [y] => x.append(y) x = x + [y, z] => x.extend([y, z]) 'x=%s' % repr(x) => 'x=%a' % (x,) 'x=%s' % x + s => 'x=%s%s' % (x, s) x = x + ', [%s]' % y => x = '%s, [%s]' % (x, y) range(0, x) => range(x) for i in range(len(a)): x = a[i] ... => for i, x in enumerate(a): ... i = 0; for x in a: i += 1 ... => for i, x in enumerate(a, 1): ... i = 1; for x in a: ... i += 1 => for i, x in enumerate(a, 1): ... s = 0; for ...: if ...: s += 1 => s = sum(1 for ... if ...) while True: s = f.readline(); if not s: break; ... => for s in f: ... def f(x): ... len() ... => def f(x, __len = len): ... __len() ... Not all such transformations are always safe (and I know code in stdlib where they are not).
Serhiy Storchaka, 11.09.2012 20:48:
set([1, 2, 3]) => {1, 2, 3} set([x for ...]) => {x for ...} dict([(k, v) for ...]) => {k: v for ...} dict((k, v) for ...) => {k: v for ...} ''.join([s for ...]) => ''.join(s for ...) a.extend([s for ...]) => a.extend(s for ...) (f(x) for x in a) => map(f, a) (x.y for x in a) => map(operator.attrgetter('y'), a) (x[0] for x in a) => map(operator.itemgetter(0), a) (2 * x for x in a) => map((2).__mul__, a) (x in b for x in a) => map(b.__contains__, a) map(lambda x: x.strip(), a) => (x.strip() for x in a) x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'} x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'} a = []; for ...: a.append(x) => a = [x for ...] a = (); for ...: a.add(x) => a = {x for ...} a = {}; for ...: a[k] = v => a = {k: v for ...} for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...) x = x + 1 => x += 1 x = x + ' ' => x += ' ' x = x + [y] => x.append(y) x = x + [y, z] => x.extend([y, z]) 'x=%s' % repr(x) => 'x=%a' % (x,) 'x=%s' % x + s => 'x=%s%s' % (x, s) x = x + ', [%s]' % y => x = '%s, [%s]' % (x, y) range(0, x) => range(x) for i in range(len(a)): x = a[i] ... => for i, x in enumerate(a): ... i = 0; for x in a: i += 1 ... => for i, x in enumerate(a, 1): ... i = 1; for x in a: ... i += 1 => for i, x in enumerate(a, 1): ... s = 0; for ...: if ...: s += 1 => s = sum(1 for ... if ...) while True: s = f.readline(); if not s: break; ... => for s in f: ... def f(x): ... len() ... => def f(x, __len = len): ... __len() ...
Not all such transformations are always safe (and I know code in stdlib where they are not).
Actually, many of them are not, and some of them are even plain wrong or may at least turn out not to be optimisations. Stefan
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
Does this mean transformation print("A") => None and output at compile time?
Only a subset of builtin functions are called at compile time, and not with any constant value. See astoptimizer/config_builtin_funcs.py. print() has other optimizations, but it is not called at compile time.
* Loop: replace range() with xrange() on Python 2,
range(0, 10**10, 10**9)
Oh. I didn't know this difference between range and xrange. I should add more checks on arguments.
* Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples:
- 1 + 2 * 3 => 7
Does this mean 1/0 raises exception at compile time?
Such trap are handled carefully. There are many checks to decide if an operator can be optimized or not. Example: "\U0010ffff"[0] is not optimized by default.
* Remove dead code. Examples:
- def f(): return 1; return 2 => def f(): return 1 - if DEBUG: print("debug") => pass with DEBUG declared as False - while 0: ... => pass
As Nick said, it could change the semantics, if there were local variable assignment or yield in removed code.
yield is handled correctly, but local variable is a new problem. I should find something to detect local variables. I already have code to handle variable, but it not fully enabled because there are known bugs.
It would be good if the optimizer checked if the built-in names overridden in this module or function and disabled this dangerous optimization in such case.
Simple cases like "len=chr" or "from math import floor as int" are now handled correctly in the new version 0.3.1.
set([1, 2, 3]) => {1, 2, 3}
I don't see how to create literal set or frozen in AST. set([...]) may be converted to set((...)) at least. It would be nice to have a new ast.Lit() in Python 3.4: the idea was already proposed (with a patch, 0_ast.patch) in the issue #11549.
set([x for ...]) => {x for ...} dict([(k, v) for ...]) => {k: v for ...} dict((k, v) for ...) => {k: v for ...} ''.join([s for ...]) => ''.join(s for ...) a.extend([s for ...]) => a.extend(s for ...)
These optimizations look correct.
(f(x) for x in a) => map(f, a) (x.y for x in a) => map(operator.attrgetter('y'), a) (x[0] for x in a) => map(operator.itemgetter(0), a) (2 * x for x in a) => map((2).__mul__, a) (x in b for x in a) => map(b.__contains__, a) map(lambda x: x.strip(), a) => (x.strip() for x in a)
Is it faster? :-)
x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'}
A list can contain non-hashable objects, whereas a set can not.
x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'}
You need to know the type of x. Depending on the type, x.__eq__ and x.__contains__ may be completly different.
a = []; for ...: a.append(x) => a = [x for ...] a = (); for ...: a.add(x) => a = {x for ...} a = {}; for ...: a[k] = v => a = {k: v for ...}
Yeah, the first example is already in the TODO list :-) But it's not trivial. I prefer to start with unrolling loops :-)
for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...)
f.write lookup cannot be optimized.
x = x + 1 => x += 1 x = x + ' ' => x += ' '
I don't know if these optimizations are safe.
x = x + [y] => x.append(y) x = x + [y, z] => x.extend([y, z])
It behaves differently if x is not a list, but str for example.
'x=%s' % repr(x) => 'x=%a' % (x,)
I don't understand this one.
'x=%s' % x + s => 'x=%s%s' % (x, s) x = x + ', [%s]' % y => x = '%s, [%s]' % (x, y)
Doesn't work if s type is not str.
range(0, x) => range(x)
Is it faster?
for i in range(len(a)): x = a[i] ... => for i, x in enumerate(a): ... i = 0; for x in a: i += 1 ... => for i, x in enumerate(a, 1): ... i = 1; for x in a: ... i += 1 => for i, x in enumerate(a, 1): ... s = 0; for ...: if ...: s += 1 => s = sum(1 for ... if ...)
These optimizations look interesting.
while True: s = f.readline(); if not s: break; ... => for s in f: ...
Too much assumptions on f type.
def f(x): ... len() ... => def f(x, __len = len): ... __len() ...
Ah yes, I read somewhere than locals are faster than builtins. But this specific example is wrong, f(1, 2) must fail. It is possible to write something like: def create_f(): len = len def f(x): ... len ... return f f = create_f(); del f Victor
On 12.09.12 00:47, Victor Stinner wrote:
set([x for ...]) => {x for ...} dict([(k, v) for ...]) => {k: v for ...} dict((k, v) for ...) => {k: v for ...} ''.join([s for ...]) => ''.join(s for ...) a.extend([s for ...]) => a.extend(s for ...)
These optimizations look correct.
Actually generator can be slower list comprehension. Especially on Python2. I think this is an opportunity to optimize the work with generators.
(f(x) for x in a) => map(f, a) (x.y for x in a) => map(operator.attrgetter('y'), a) (x[0] for x in a) => map(operator.itemgetter(0), a) (2 * x for x in a) => map((2).__mul__, a) (x in b for x in a) => map(b.__contains__, a) map(lambda x: x.strip(), a) => (x.strip() for x in a)
Is it faster? :-)
Yes, significantly for large sequences. But this transformation is not safe in general case. For short sequences possible regression (cost of "map" name lookup and function call).
x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'}
A list can contain non-hashable objects, whereas a set can not.
Agree, it applicable if x is proven str. At least list can be replaced by tuple.
x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'}
You need to know the type of x. Depending on the type, x.__eq__ and x.__contains__ may be completly different.
Then => x in ('i', 'em', 'cite') and move forward only if x obviously is of the appropriate type.
for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...)
f.write lookup cannot be optimized.
Yes, it is a dangerous transformation and it is difficult to prove its safety. But name lookup is one of the main brakes of Python.
x = x + 1 => x += 1 x = x + ' ' => x += ' '
I don't know if these optimizations are safe.
It is safe if x is proven number or string. If x is local variable, initialized by number/string and modified only by number/string. Counters and string accumulators are commonly used.
'x=%s' % repr(x) => 'x=%a' % (x,)
I don't understand this one.
Sorry, it should be => 'x=%r' % (x,). And for more arguments: 'x[' + repr(k) + ']=' + repr(v) + ';' => 'x[%r]=%r;' % (k, v). Same for str and ascii. It is not safe (repr can be shadowed).
'x=%s' % x + s => 'x=%s%s' % (x, s) x = x + ', [%s]' % y => x = '%s, [%s]' % (x, y)
Doesn't work if s type is not str.
Yes, this is partially applicable. In many cases, s is a literal or the newly formatted string.
range(0, x) => range(x)
Is it faster?
Slightly.
while True: s = f.readline(); if not s: break; ... => for s in f: ...
Too much assumptions on f type.
I personally would prefer a 2to3-like "modernizer" (as a separate utility and as plugins for the IDEs), which would have found some templates and offered replacing by a more modern, readable (and possibly effective) variant. The decision on the applicability of the transformation in the particular case remains for the human. For the automatic optimizer remain only simple transformations which deteriorate readability, and optimizations which cannot be expressed in the source code.
2012/9/12 Serhiy Storchaka <storchaka@gmail.com>:
set([x for ...]) => {x for ...} dict([(k, v) for ...]) => {k: v for ...} dict((k, v) for ...) => {k: v for ...} ''.join([s for ...]) => ''.join(s for ...) a.extend([s for ...]) => a.extend(s for ...)
These optimizations look correct.
Actually generator can be slower list comprehension. Especially on Python2. I think this is an opportunity to optimize the work with generators.
I checked with timeit, and yes: generators are slower :-/ I will revert this "optimization".
(f(x) for x in a) => map(f, a) (x.y for x in a) => map(operator.attrgetter('y'), a) (x[0] for x in a) => map(operator.itemgetter(0), a) (2 * x for x in a) => map((2).__mul__, a) (x in b for x in a) => map(b.__contains__, a) map(lambda x: x.strip(), a) => (x.strip() for x in a)
Is it faster? :-)
Benchmark using iterable=tuple(range(n)) and f=str (see attached script). Python version: 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] CPU model: Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz Platform: Linux-3.2.0-30-generic-pae-i686-with-Ubuntu-12.04-precise Bits: int=32, long=32, long long=64, pointer=32 [ 3 items ] 679 ns: list comprehesion 1.08 us (+59%): itertools.imap 1.42 us (+109%): generator [ 10 items ] 1.6 us: itertools.imap 1.64 us: list comprehesion 2.26 us (+41%): generator [ 1000 items ] 112 us: itertools.imap 144 us (+29%): list comprehesion 156 us (+40%): generator [ 1000000 items ] 142 ms: itertools.imap 183 ms (+29%): generator 186 ms (+31%): list comprehesion --- Python version: 3.2.3 (default, May 3 2012, 15:54:42) [GCC 4.6.3] CPU model: Intel(R) Core(TM) i5 CPU 661 @ 3.33GHz Platform: Linux-3.2.0-30-generic-pae-i686-with-Ubuntu-12.04-precise Bits: int=32, long=32, long long=64, pointer=32 [ 3 items ] 1.04 us: list comprehesion 1.21 us (+17%): map 1.51 us (+45%): generator [ 10 items ] 2.02 us: map 2.29 us (+13%): list comprehesion 2.68 us (+33%): generator [ 1000 items ] 132 us: map 166 us (+25%): list comprehesion 183 us (+38%): generator [ 1000000 items ] 182 ms: map 229 ms (+26%): generator 251 ms (+38%): list comprehesion --
Yes, significantly for large sequences. But this transformation is not safe in general case. For short sequences possible regression (cost of "map" name lookup and function call).
So except for very small dataset, map (itertools.imap) is always faster. List comprehension cannot be replaced with map() because map() doesn't set the iterator variable to the last item if the iterable.
x in ['i', 'em', 'cite'] => x in {'i', 'em', 'cite'} Agree, it applicable if x is proven str. At least list can be replaced by tuple. (...) x == 'i' or x == 'em' or x == 'cite'] => x in {'i', 'em', 'cite'} (...) x = x + 1 => x += 1 x = x + ' ' => x += ' '
Well, type inference would permit more optimizations. It is not implemented yet.
for ...: f.write(...) => __fwrite = f.write; for ...: __fwrite(...)
f.write lookup cannot be optimized.
Yes, it is a dangerous transformation and it is difficult to prove its safety. But name lookup is one of the main brakes of Python.
Oh sorry, I didn't read correctly the example. Yeah, such optimization is common and it would help to have an option to enable it. Using type inference, it may be possible to optimize safetly some cases (ex: if you know that f is a file).
Sorry, it should be 'x=%s' % repr(x) => 'x=%r' % (x,)
Ah ok, why not.
range(0, x) => range(x)
Is it faster?
Slightly.
timeit gives me exactly the same timing.
I personally would prefer a 2to3-like "modernizer" (as a separate utility and as plugins for the IDEs), which would have found some templates and offered replacing by a more modern, readable (and possibly effective) variant. The decision on the applicability of the transformation in the particular case remains for the human. For the automatic optimizer remain only simple transformations which deteriorate readability, and optimizations which cannot be expressed in the source code.
If the optimizer sees an interesting optimization but cannot decide if one option is better than another, it may write an advice for the developer. The developer can review the advice and decide which option is the best. Some patterns are faster or slower depending on the Python versions :-/ The optimizer may be able to decide which pattern is the best for the running Python version. Victor
On 9/12/2012 3:36 AM, Serhiy Storchaka wrote:
I personally would prefer a 2to3-like "modernizer" (as a separate utility and as plugins for the IDEs), which would have found some templates and offered replacing by a more modern, readable (and possibly effective) variant. The decision on the applicability of the transformation in the particular case remains for the human.
IDLE has a plug-in mechanism, though I am not familiar with it yet. It also has a built-in parser of some sort. It is used, for instance, to determine the function expression that preceeds '(' in order to get the function object for a tool tip.
For the automatic optimizer remain only simple transformations which deteriorate readability, and optimizations which cannot be expressed in the source code.
I had just made the same observation that some of the proposed optimization are really source transformations and others are only ast or even lower level changes. We also need to differentiate changes which are pretty much guaranteed to be faster (at least with current releases) and those those might be with particular hardware, os, and python version. -- Terry Jan Reedy
On 12.09.12 00:47, Victor Stinner wrote:
x = x + [y] => x.append(y) x = x + [y, z] => x.extend([y, z])
It behaves differently if x is not a list, but str for example.
Actually even worse. Transformations applicable only if x has not aliases. Pseudocode: if type(x) is list and refcount(x) == 1: list.append(x, y) else: x = x + [y] This optimization can be done only in the interpreter, otherwise overhead costs will be too high.
Hello, i am a longtime Reader of this list and this is the first time i a dare to speak up. Apology in advance for any noise, silly comments and not posting to python-ideas ;). Am 11.09.2012 12:41, schrieb Victor Stinner:
I plan to implement other optimizations like unrolling loop or convert a loop to a list comprehension, see the TODO file.
Don't hesitate to propose more optimizations if you have some ideas ;-)
Well how about implementing guards like in pypy? They are not optimizations by themselves but should enable a whole lot of them. I am not good at explaining so i've written this short example (python 2.7.3). I hope it makes clear what i mean. Thank you for reading larudwer # # Example for implementing an guard # by freezing global and local dictionary # it is not an optimization by itself # but should open the possibility for them. # class WorldChangedException(Exception): pass class WorldShadowedException(Exception): pass class frozendict(dict): def __setitem__(self, key, value): if key in globals(): if self == locals(): raise WorldShadowedException(key) else: raise WorldChangedException(key) def put(string): print string g = frozendict({'put': put, 'foo': 2}) l = frozendict({'bar': 'work fast'}) # # This is just an poor example. # astoptimizer should be able to generate # much better code # f = compile(""" put( "trying fast path") put(bar) put = 27 """, "<astoptimizer>", "exec") def test(): global put # Workaround for UnboundLocalError put("trying slow path") put = 27 # # Guard: since the world is frozen, nothing can change. # if the fast path succeeds we take the result, else # we use the slow path. # try: exec(f, g, l) except WorldChangedException, e: print "WorldChangedException Caught", e test() except WorldShadowedException, e: print "WorldShadowedException Caught", e test()
i am a longtime Reader of this list and this is the first time i a dare to speak up. Apology in advance for any noise, silly comments and not posting to python-ideas ;).
Welcome!
Well how about implementing guards like in pypy?
Guards would allow to generate specialized functions without a JIT. Dummy example: def func(arg): arg *= 1 arg += 0 return arg * 16 Can be optimized to: def func(arg): if type(arg) is int: # fast path return arg << 4 else: # slow path arg *= 1 arg += 0 return arg * 16 I prefer a check before executing the code, rather than an exception and reexecute the same code twice. Specializing all functions would waste a lot of memory, so it should only be applied on some very specific cases... It can also be slower if the guard check is slower than the function body! But I bet that guards would help to enable more aggressive optimizations, or at least make some optimizations safe. Dave Malcolm wrote a patch modifying eval.c to support specialized functions. See the http://bugs.python.org/issue10399 I don't know yet what is the best approach for CPython. -- For the specific case of builtin functions and types, I made two changes in Python 3.3: * the type of the builtins mapping can now be any mapping (Python <= 3.2 requires the dict type, dict subtypes are disallowed) * "new" types.MappingProxyType type to create a read-only proxy for a mapping (see also the rejected PEP 416) We may combine these two changes to use a read-only mapping for builtins. It would at least help to ensure that an application does not monkeypatch builtin functions/types. Victor
participants (10)
-
Christian Heimes
-
Guido van Rossum
-
larudwer
-
Maciej Fijalkowski
-
MRAB
-
Nick Coghlan
-
Serhiy Storchaka
-
Stefan Behnel
-
Terry Reedy
-
Victor Stinner