data:image/s3,"s3://crabby-images/b3d87/b3d872f9a7bbdbbdbd3c3390589970e6df22385a" alt=""
Hi, Here is the second PEP, part of a serie of 3 PEP to add an API to implement a static Python optimizer specializing functions with guards. HTML version: https://faster-cpython.readthedocs.org/pep_specialize.html PEP: xxx Title: Specialized functions with guards Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add an API to add specialized functions with guards to functions, to support static optimizers respecting the Python semantic. Rationale ========= Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implement optimizations respecting the Python semantic requires to detect when "something changes", we will call these checks "guards". This PEP proposes to add a ``specialize()`` method to functions to add a specialized functions with guards. When the function is called, the specialized function is used if nothing changed, otherwise use the original bytecode. Writing an optimizer is out of the scope of this PEP. Example ======= Using bytecode -------------- Replace ``chr(65)`` with ``"A"``:: import myoptimizer def func(): return chr(65) def fast_func(): return "A" func.specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")]) del fast_func print("func(): %s" % func()) print("#specialized: %s" % len(func.get_specialized())) print() import builtins builtins.chr = lambda obj: "mock" print("func(): %s" % func()) print("#specialized: %s" % len(func.get_specialized())) Output:: func(): A #specialized: 1 func(): mock #specialized: 0 The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the builtin ``len()`` function and the ``len`` name in the global namespace. The guard fails if the builtin function is replaced or if a ``len`` name is defined in the global namespace. The first call returns directly the string ``"A"``. The second call removes the specialized function because the builtin ``chr()`` function was replaced, and executes the original bytecode On a microbenchmark, calling the specialized function takes 88 ns, whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast. Using builtin function ---------------------- Replace a slow Python function calling ``chr(obj)`` with a direct call to the builtin ``chr()`` function:: import myoptimizer def func(arg): return chr(arg) func.specialize(chr, [myoptimizer.GuardBuiltins("chr")]) print("func(65): %s" % func(65)) print("#specialized: %s" % len(func.get_specialized())) print() import builtins builtins.chr = lambda obj: "mock" print("func(65): %s" % func(65)) print("#specialized: %s" % len(func.get_specialized())) Output:: func(): A #specialized: 1 func(): mock #specialized: 0 The first call returns directly the builtin ``chr()`` function (without creating a Python frame). The second call removes the specialized function because the builtin ``chr()`` function was replaced, and executes the original bytecode. On a microbenchmark, calling the specialized function takes 95 ns, whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling directly ``chr(65)`` takes 76 ns. Python Function Call ==================== Pseudo-code to call a Python function having specialized functions with guards:: def call_func(func, *args, **kwargs): # by default, call the regular bytecode code = func.__code__.co_code specialized = func.get_specialized() nspecialized = len(specialized) index = 0 while index < nspecialized: guard = specialized[index].guard # pass arguments, some guards need them check = guard(args, kwargs) if check == 1: # guard succeeded: we can use the specialized function code = specialized[index].code break elif check == -1: # guard will always fail: remove the specialized function del specialized[index] elif check == 0: # guard failed temporarely index += 1 # code can be a code object or any callable object execute_code(code, args, kwargs) Changes ======= * Add two new methods to functions: - ``specialize(code, guards: list)``: add specialized function with guard. `code` is a code object (ex: ``func2.__code__``) or any callable object (ex: ``len``). The specialization can be ignored if a guard already fails. - ``get_specialized()``: get the list of specialized functions with guards * Base ``Guard`` type which can be used as parent type to implement guards. It requires to implement a ``check()`` function, with an optional ``first_check()`` function. API: * ``int check(PyObject *guard, PyObject **stack)``: return 1 on success, 0 if the guard failed temporarely, -1 if the guard will always fail * ``int first_check(PyObject *guard, PyObject *func)``: return 0 on success, -1 if the guard will always fail Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best of 3 runs): * Original Python: 79 ns * Patched Python: 79 ns According to this microbenchmark, the changes has no overhead on calling a Python function without specialization. Behaviour ========= When a function code is replaced (``func.__code__ = new_code``), all specialized functions are removed. When a function is serialized (by ``marshal`` or ``pickle`` for example), specialized functions and guards are ignored (not serialized). Copyright ========= This document has been placed in the public domain. -- Victor