Third milestone of FAT Python
Hi, I implemented 3 new optimizations in FAT Python: loop unrolling, constant folding and copy builtin functions to constants. In the previous thread, Terry Reedy asked me if the test suite is complete enough to ensure that FAT Python doesn't break the Python semantic. I can now say that the second milestone didn't optimize enough code to see bugs, new optimizations helped me to find a *lot* of bugs which are now all fixed. The full Python test suite pass with all optimizations enabled. Only two tests are skipped in FAT Python: test_dynamic and mock tests of test_unittest. test_dynamic checks that it's possible to replace builtin functions in a function and then use the replaced builtins from the same function. FAT Python currently doesn't support the specific case of this test. The mock tests of test_unittest does something similar, I'm more concerned by these failures. This email is an updated version on my previous blog article: https://haypo.github.io/fat-python-status-nov26-2015.html Since I wrote this blog article, I implemented the constant folding optimization and I fixed the two major bugs mentioned in the article (line number and exec(code, dict)). Documentation ============= I combined the documentation of (my) various optimizations projects into a single documentation: http://faster-cpython.readthedocs.org/ The FAT Python project has its own page: http://faster-cpython.readthedocs.org/fat_python.html Constant folding ================ This optimization propagates constant values of variables. Example: def func() x = 1 y = x return y Constant folding: def func() x = 1 y = 1 return 1 This optimization alone is not really exciting. It will more used later when the optimizer will implement peephole optimizations (ex: a+b) and remove dead code. For example, it will be possible to remove code specific to a platform (ex: 'if sys.platform.startswith("freebsd"): ...'). Later, removal of unused local variables will be implemented to simplify the code even more. The previous example will be simplified to: def func(): return 1 Loop unrolling optimization =========================== The optimization generates assignement statements (for the loop index variable) and duplicates the loop body to reduce the cost of loops. Example: def func(): for i in range(2): print(i) Loop unrolled: def func(): i = 0 print(i) i = 1 print(i) If the iterator uses the builtin range function, two guards are required on builtins and globals namespaces. The optimization also handles tuple iterator (ex: "for i in (1, 2, 3): ..."). No guard is needed in this case (the code is always optimized). Loop unrolling combines well with constant folding. The previous example is simplified to: def func(): i = 0 print(0) i = 1 print(1) Again, with a future removal of unused local variables optimization, the previous example will be simplified to: def func(): print(0) print(1) Copy builtins to constants optimization ======================================= This optimization is currently disabled by default. (Well, in practice, it's enabled by the site module to test it and detect code which doesn't work with it.) The LOAD_GLOBAL instruction is used to load a builtin function. The instruction requires two dictionary lookup: one in the globals namespace (which almost always fail) and then in the builtins namespace. It's rare to replace builtins, so the idea here is to replace the dynamic LOAD_GLOBAL instruction with a static LOAD_CONST instruction which loads the function from a C array, a fast O(1) access. It is not possible to inject a builtin function during the compilation. Python code objects are serialized by the marshal module which only support simple types like integers, strings and tuples, not functions. The trick is to modify the constants at runtime when the module is loaded. I added a new patch_constants() method to functions. Example: def log(message): print(message) This function is specialized to: def log(message): 'LOAD_GLOBAL print'(message) log.patch_constants({'LOAD_GLOBAL print': print}) The specialized bytecode uses two guards on builtins and globals namespaces to disable the optimization if the builtin function is replaced. This optimization doesn't support the case when builtins are modified while the function is executed. I think that it will be safer to disable the optimization by default. Later, we can enhance the optimization to enable it if the function cannot modify builtins and if it only calls funcions which cannot modify builtins. I bet that the optimization will be safe with these additional checks. Changes on builtin guards ========================= When a guard is used on a builtin function, the specialization of the function is now ignored if the builtin was replaced or if a function with the name already exists in the globals namespace. At the end of the Python finalization (after the site module is imported), the fat module keeps a private copy of builtins. When a builtin guard is used, the current builtin function is simply compared to the old copy old builtins. The assumption here is that builtin functions are not replaced during Python initialization. By the way, I started to document FAT PYthon limitations and effects on the Python semantic: http://faster-cpython.readthedocs.org/fat_python.html#limitations-and-python... Lot of enhancements of the AST optimizer ======================================== New optimizations helped to find bugs in the AST optimizer. Many fixes and various enhancements were done in the AST optimizer. The optimizer was optimized, copy.deepcopy() is no more used to duplicate a full tree. The new NodeTransformer class only duplicates modified nodes. The optimizer understands now much better Python namespaces (globals, locals, non locals, etc.). It is now able to optimize a function without guards: it's used to unroll a loop using a tuple as iterator. Versionned dictionary ===================== In the previous milestone of FAT Python, the versionned dictionary was a new type inherited from the builtin dict type which added a __version__ read-only (global "version" of dictionary, incremented at each change), a getversion(key) method (version of a one key) and it added support for weak references. I done my best to make FAT Python changes optional to leave CPython completly unchanged to not hit performances when the FAT mode is not used. But I had two major technical issues. The first one is that using a different structure for dictionary entries would make the dict code more complex and maybe even slower (which is not acceptable). The second one is that it was no more possible to call exec(code, globals, locals) in FAT mode where globals or locals were a dict. The code needed to be modified using something like: globals = fat.verdict() if __fat__ or {} It required to import the fat module and modify all code calling exec(). I removed the fat.verdict type and added the __version__ property to the builtin dict type. It's incremented at each change. The getversion() method and the support for weak reference is removed. Python has already special code to handle reference cycles of dictionaries, there is no need to support weak references. Guards now use strong references to namespaces. Victor
On 2015-12-04 12:49, Victor Stinner wrote: [snip]
Constant folding ================
This optimization propagates constant values of variables. Example:
def func() x = 1 y = x return y
Constant folding:
def func() x = 1 y = 1 return 1
[snip] I don't think that's constant folding, but constant _propagation_. Constant folding is when, say, "1 + 2" replaced by "2".
On 12/4/2015 11:39 AM, MRAB wrote:
On 2015-12-04 19:22, Isaac Morland wrote:
On Fri, 4 Dec 2015, MRAB wrote:
Constant folding is when, say, "1 + 2" replaced by "2".
Isn't that called backspacing? ;-)
Oops! I meant "1 + 1", of course. Or "3". Either would work. :-)
Oh, you must surely have meant '1 and 2' Looking-for-truth-in-all-the-wrong-places-ly y'rs, Emile
2015-12-04 20:16 GMT+01:00 MRAB <python@mrabarnett.plus.com>:
On 2015-12-04 12:49, Victor Stinner wrote: [snip]
I don't think that's constant folding, but constant _propagation_.
Constant folding is when, say, "1 + 2" replaced by "2".
Oh, you're right. I update the documentation. To avoid confusion, I just implemented constant folding as well :-D https://faster-cpython.readthedocs.org/fat_python.html#constant-folding Victor
On Dec 04 2015, Victor Stinner <victor.stinner@gmail.com> wrote:
Hi,
I implemented 3 new optimizations in FAT Python: loop unrolling, constant folding and copy builtin functions to constants. In the previous thread, Terry Reedy asked me if the test suite is complete enough to ensure that FAT Python doesn't break the Python semantic. [...]
I just wanted to say that I think this is truly great! Thanks working on this! Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
participants (5)
-
Emile van Sebille
-
Isaac Morland
-
MRAB
-
Nikolaus Rath
-
Victor Stinner