On Tue, Sep 11, 2012 at 8:41 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
* Call builtin functions if arguments are constants. Examples:
- len("abc") => 3 - ord("A") => 65
This is fine in an external project, but should never be added to the standard library. The barrier to semantic changes that break monkeypatching should be high. Yes, this is frustrating as it eliminates a great many interesting static optimisations that are *probably* OK. That's one of the reasons why PyPy uses tracing - it can perform these optimisations *and* still include the appropriate dynamic checks. However, the double barrier of third party module + off by default is a suitable activation barrier for ensuring people know that what they're doing is producing bytecode that doesn't behave like standard Python any more (e.g. tests won't be able to shadow builtins or optimised module references). Optimisations that break the language semantics are heading towards the same territory as the byteplay and withhacks modules (albeit not as evil internally).
* Call methods of builtin types if the object and arguments are constants. Examples:
- u"h\\xe9ho".encode("utf-8") => b"h\\xc3\\xa9ho" - "python2.7".startswith("python") => True - (32).bit_length() => 6 - float.fromhex("0x1.8p+0") => 1.5
That last one isn't constant, it's a name lookup. Very cool optimisations for literals, though.
* Call functions of math and string modules for functions without border effect. Examples:
- math.log(32) / math.log(2) => 5.0 - string.atoi("5") => 5
Same comment applies here as for the builtin optimisation: fine in an external project, not in the standard library (even if it's off by default - merely having it there is still an official endorsement of deliberately breaking the dynamic lookup semantics of our own language).
* Format strings for str%args and print(arg1, arg2, ...) if arguments are constants and the format string is valid. Examples:
- "x=%s" % 5 => "x=5" - print(1.5) => print("1.5")
The print example runs afoul of the general rule above: not in the standard library, because you're changing the values seen by a mocked version of print()
* Simplify expressions. Examples:
- not(x in y) => x not in y
This (and the "is") equivalent should be OK
- 4 and 5 and x and 6 => x and 6
So long as this is just constant folding, that should be fine, too.
* Loop: replace range() with xrange() on Python 2, and list with tuple. Examples:
- for x in range(n): ... => for x in xrange(n): ... - for x in [1, 2, 3]: ... => for x in (1, 2, 3): ...
Name lookup optimisations again: not in the standard library.
* Evaluate unary and binary operators, subscript and comparaison if all arguments are constants. Examples:
- 1 + 2 * 3 => 7 - not True => False - "abc" * 3 => "abcabcabc" - abcdef[:3] => abc - (2, 7, 3)[1] => 7 - frozenset("ab") | frozenset("bc") => frozenset("abc") - None is None => True - "2" in "python2.7" => True - "def f(): return 2 if 4 < 5 else 3" => "def f(): return 2"
Yep, literals are good.
* Remove dead code. Examples:
- def f(): return 1; return 2 => def f(): return 1 - if DEBUG: print("debug") => pass with DEBUG declared as False - while 0: ... => pass
Dangerous. def f(): return 1; yield if DEBUG: yield while 0: yield
def f(): ... if 0: ... global x ... return x ... f() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in f NameError: global name 'x' is not defined
Unsafe optimizations are disabled by default. Optimizations can be enabled using a Config class with "features" like "builtin_funcs" (builtin functions like len()) or "pythonbin" (optimized code will be execute by the same Python binary executable).
astoptimizer.patch_compile() can be used to hook the optimizer in the compile() builtin function. On Python 3.3, it is enough to use the optimizer on imports (thanks to the importlib). On older versions, the compileall module can be used to compile a whole project using the optimizer.
I didn't start to benchmark anything yet, I focused on fixing bugs (not generating invalid code). I will start benchmarks when the "variables" feature (ex: "x=1; print(x)" => "x=1; print(1)") will work. There is an experimental support of variables, but it is still too agressive and generate invalid code in some cases (see the TODO file).
I plan to implement other optimizations like unrolling loop or convert a loop to a list comprehension, see the TODO file.
Don't hesitate to propose more optimizations if you have some ideas ;-)
Mainly just a request to be *very*, *very* clear that the unsafe optimisations will produce bytecode that *does not behave like Python* with respect to name lookup semantics, thus mock based testing that relies on name shadowing will not work correctly, and neither will direct monkeypatching. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia