RFC: PEP: Specialized functions with guards

Hi, Here is the second PEP, part of a serie of 3 PEP to add an API to implement a static Python optimizer specializing functions with guards. HTML version: https://faster-cpython.readthedocs.org/pep_specialize.html PEP: xxx Title: Specialized functions with guards Version: $Revision$ Last-Modified: $Date$ Author: Victor Stinner <victor.stinner@gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 4-January-2016 Python-Version: 3.6 Abstract ======== Add an API to add specialized functions with guards to functions, to support static optimizers respecting the Python semantic. Rationale ========= Python is hard to optimize because almost everything is mutable: builtin functions, function code, global variables, local variables, ... can be modified at runtime. Implement optimizations respecting the Python semantic requires to detect when "something changes", we will call these checks "guards". This PEP proposes to add a ``specialize()`` method to functions to add a specialized functions with guards. When the function is called, the specialized function is used if nothing changed, otherwise use the original bytecode. Writing an optimizer is out of the scope of this PEP. Example ======= Using bytecode -------------- Replace ``chr(65)`` with ``"A"``:: import myoptimizer def func(): return chr(65) def fast_func(): return "A" func.specialize(fast_func.__code__, [myoptimizer.GuardBuiltins("chr")]) del fast_func print("func(): %s" % func()) print("#specialized: %s" % len(func.get_specialized())) print() import builtins builtins.chr = lambda obj: "mock" print("func(): %s" % func()) print("#specialized: %s" % len(func.get_specialized())) Output:: func(): A #specialized: 1 func(): mock #specialized: 0 The hypothetical ``myoptimizer.GuardBuiltins("len")`` is a guard on the builtin ``len()`` function and the ``len`` name in the global namespace. The guard fails if the builtin function is replaced or if a ``len`` name is defined in the global namespace. The first call returns directly the string ``"A"``. The second call removes the specialized function because the builtin ``chr()`` function was replaced, and executes the original bytecode On a microbenchmark, calling the specialized function takes 88 ns, whereas the original bytecode takes 145 ns (+57 ns): 1.6 times as fast. Using builtin function ---------------------- Replace a slow Python function calling ``chr(obj)`` with a direct call to the builtin ``chr()`` function:: import myoptimizer def func(arg): return chr(arg) func.specialize(chr, [myoptimizer.GuardBuiltins("chr")]) print("func(65): %s" % func(65)) print("#specialized: %s" % len(func.get_specialized())) print() import builtins builtins.chr = lambda obj: "mock" print("func(65): %s" % func(65)) print("#specialized: %s" % len(func.get_specialized())) Output:: func(): A #specialized: 1 func(): mock #specialized: 0 The first call returns directly the builtin ``chr()`` function (without creating a Python frame). The second call removes the specialized function because the builtin ``chr()`` function was replaced, and executes the original bytecode. On a microbenchmark, calling the specialized function takes 95 ns, whereas the original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling directly ``chr(65)`` takes 76 ns. Python Function Call ==================== Pseudo-code to call a Python function having specialized functions with guards:: def call_func(func, *args, **kwargs): # by default, call the regular bytecode code = func.__code__.co_code specialized = func.get_specialized() nspecialized = len(specialized) index = 0 while index < nspecialized: guard = specialized[index].guard # pass arguments, some guards need them check = guard(args, kwargs) if check == 1: # guard succeeded: we can use the specialized function code = specialized[index].code break elif check == -1: # guard will always fail: remove the specialized function del specialized[index] elif check == 0: # guard failed temporarely index += 1 # code can be a code object or any callable object execute_code(code, args, kwargs) Changes ======= * Add two new methods to functions: - ``specialize(code, guards: list)``: add specialized function with guard. `code` is a code object (ex: ``func2.__code__``) or any callable object (ex: ``len``). The specialization can be ignored if a guard already fails. - ``get_specialized()``: get the list of specialized functions with guards * Base ``Guard`` type which can be used as parent type to implement guards. It requires to implement a ``check()`` function, with an optional ``first_check()`` function. API: * ``int check(PyObject *guard, PyObject **stack)``: return 1 on success, 0 if the guard failed temporarely, -1 if the guard will always fail * ``int first_check(PyObject *guard, PyObject *func)``: return 0 on success, -1 if the guard will always fail Microbenchmark on ``python3.6 -m timeit -s 'def f(): pass' 'f()'`` (best of 3 runs): * Original Python: 79 ns * Patched Python: 79 ns According to this microbenchmark, the changes has no overhead on calling a Python function without specialization. Behaviour ========= When a function code is replaced (``func.__code__ = new_code``), all specialized functions are removed. When a function is serialized (by ``marshal`` or ``pickle`` for example), specialized functions and guards are ignored (not serialized). Copyright ========= This document has been placed in the public domain. -- Victor

On Sat, Jan 9, 2016 at 8:31 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
When a function is serialized (by ``marshal`` or ``pickle`` for example), specialized functions and guards are ignored (not serialized).
Does this mean that any code imported from a .pyc file cannot take advantage of these kinds of optimizations? ChrisA

2016-01-09 2:42 GMT+01:00 Chris Angelico <rosuav@gmail.com>:
Ah yes, this sentence is confusing. It should not mention marshal, it's wrong. A .pyc file doesn't not contain functions... It only contains code objects. Functions are only created at runtime. Specialized functions are also added at runtime. Victor

PEP: xxx Title: Specialized functions with guards
FYI I published the PEP at python.org and it got the number 510: "PEP 0510 -- Specialized functions with guards" https://www.python.org/dev/peps/pep-0510/ Victor

I discussed this PEP on the #pypy IRC channel. I will try to summarize the discussion with comments on the PEP directly. 2016-01-08 22:31 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Add an API to add specialized functions with guards to functions, to support static optimizers respecting the Python semantic.
"respecting the Python semantics" is not 100% exact. In fact, my FAT Python makes suble changes on the "Python semantics". For example, loop unrolling can completly remove the call the range() function. If a debugger is executed instruction per instruction, the output is different on an unrolled loop, since the range() call was removed, and the loop copy is copied. I should maybe elaborate this point in the rationale, explain that a compromise must be found between the funny "in Python, everything is mutable" and performance. But remember that the whole thing (FAT Python, specialization, etc.) is developed outside CPython and is fully optional.
This method doesn't make sense at all in PyPy. The method is specific to CPython since it relies on guards which have a pure C API (see below). The PEP must be more explicit about that. IMHO it's perfectly fine that PyPy makes this method a no-op (the method exactly does nothing). It's already the case if a guard "always" fail in first_check().
- ``get_specialized()``: get the list of specialized functions with guards
Again, it doesn't make sense for PyPy. Since this method is only used for unit tests, it can be converted to a function and put somewhere else, maybe in the _testcapi module. It's not a good idea to rely on this method in an application, it's really an implementation detail.
* Base ``Guard`` type
In fact, exposing the type at the C level is enough. There is no need to expose it at Python level, since the type has no method nor data, and it's not possible to use it in Python. We might expose it in a different module, again, maybe in _testcapi for unit tests.
I forgot "int na" and "int nk" parameters to support keywords arguments. Note for myself: I should add support for raising an exception.
* ``int first_check(PyObject *guard, PyObject *func)``: return 0 on success, -1 if the guard will always fail
Note for myself: I should rename the method to "init()" and support raising an exception.
Moreover, the PEP must be clear about func.__code__ content: func.specialize() must *not* modify func.__code__. It should be a completly black box. Victor

On 12 January 2016 at 08:44, Victor Stinner <victor.stinner@gmail.com> wrote:
Perhaps the specialisation call should also move to being a pure C API, only exposed through _testcapi for testing purposes? That would move both this and the dict versioning PEP into the same territory as the dynamic memory allocator PEP: low level C plumbing that enables interesting CPython specific extensions (like tracemalloc, in the dynamic allocator case) without committing other implementations to emulating features that aren't useful to them in any way. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi, 2016-01-12 1:47 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:
I really like your idea :-) It solves many issues and technically it's trivial to only add a C API and then expose it somewhere else at the Python level (for example in my "fat" module", or as you said in _testcapi for testing purpose). Instead of adding func.specialize() and func.get_specialized() at Python level, we can add *public* functions to the Python C API (excluded of the stable ABI): /* Add a specialized function with guards. Result: * - return 1 on success * - return 0 if the specialization has been ignored * - raise an exception and return -1 on error */ PyAPI_DATA(int) PyFunction_Specialize(PyObject *func, PyObject *func2, PyObject *guards); /* Get the list of specialized functions as a list of * (func, guards) where func is a callable or code object and guards * is a list of PyFuncGuard (or subtypes) objects. * Raise an exception and return NULL on error. */ PyAPI_FUNC(PyObject*) PyFunction_GetSpecialized(PyObject *func); /* Get the specialized function of a function. stack is a an array of PyObject* * objects: indexed arguments followed by (key, value) objects of keyword * arguments. na is the number of indexed arguments, nk is the number of * keyword arguments. stack contains na + nk * 2 objects. * * Return a callable or a code object on success. * Raise an exception and return NULL on error. */ PyAPI_FUNC(PyObject*) PyFunction_GetSpecializedFunc(PyObject *func, PyObject **stack, int na, int nk); Again, other Python implementations which don't want to implement function specializations can implement these functions as no-op (it's fine with the API): * PyFunction_Specialize() just returns 0 * PyFunction_GetSpecialized() creates an empty list * PyFunction_GetSpecializedFunc() returns the code object of the function (which is not something new) Or not implement these functions at all, since it doesn't make sense for them. -- First, I tried hard to avoid the need of a module to specialize functions. My first API added a specialize() method to functions which took a list of dictionaries to describe guards. The problem is this API is that it exposes the implementation details and it avoids to extend easily guard (implement new guards). Now the AST optimizer injects "import fat" to optimize code when needed. Hey, it's difficult to design a simple and obvious API! Victor

On Tue, 12 Jan 2016 at 01:59 Victor Stinner <victor.stinner@gmail.com> wrote:
This is somewhat similar to the JIT API we have been considering through our Pyjion work: * PyJit_Init() * PyJit_RegisterCodeObject() * PyJit_CompileCodeObject() If both ideas gain traction we may want to talk about whether there is some way to consolidate the APIs so we don't end up with a ton of different ways to optimize code objects.

Thank you for comments on the first version of the PEP 510. I changed it to only changes the C API, there is no more change on the Python API. I just posted the second version of the PEP to python-dev. Please move the discussion there. If you want to review others PEP on python-ideas, I'm going to post a first version of my AST transformer PEP (PEP 511), stay tuned :-D (yeah, I love working on 3 PEPs at the same time!) Victor

On Sat, Jan 9, 2016 at 8:31 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
When a function is serialized (by ``marshal`` or ``pickle`` for example), specialized functions and guards are ignored (not serialized).
Does this mean that any code imported from a .pyc file cannot take advantage of these kinds of optimizations? ChrisA

2016-01-09 2:42 GMT+01:00 Chris Angelico <rosuav@gmail.com>:
Ah yes, this sentence is confusing. It should not mention marshal, it's wrong. A .pyc file doesn't not contain functions... It only contains code objects. Functions are only created at runtime. Specialized functions are also added at runtime. Victor

PEP: xxx Title: Specialized functions with guards
FYI I published the PEP at python.org and it got the number 510: "PEP 0510 -- Specialized functions with guards" https://www.python.org/dev/peps/pep-0510/ Victor

I discussed this PEP on the #pypy IRC channel. I will try to summarize the discussion with comments on the PEP directly. 2016-01-08 22:31 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
Add an API to add specialized functions with guards to functions, to support static optimizers respecting the Python semantic.
"respecting the Python semantics" is not 100% exact. In fact, my FAT Python makes suble changes on the "Python semantics". For example, loop unrolling can completly remove the call the range() function. If a debugger is executed instruction per instruction, the output is different on an unrolled loop, since the range() call was removed, and the loop copy is copied. I should maybe elaborate this point in the rationale, explain that a compromise must be found between the funny "in Python, everything is mutable" and performance. But remember that the whole thing (FAT Python, specialization, etc.) is developed outside CPython and is fully optional.
This method doesn't make sense at all in PyPy. The method is specific to CPython since it relies on guards which have a pure C API (see below). The PEP must be more explicit about that. IMHO it's perfectly fine that PyPy makes this method a no-op (the method exactly does nothing). It's already the case if a guard "always" fail in first_check().
- ``get_specialized()``: get the list of specialized functions with guards
Again, it doesn't make sense for PyPy. Since this method is only used for unit tests, it can be converted to a function and put somewhere else, maybe in the _testcapi module. It's not a good idea to rely on this method in an application, it's really an implementation detail.
* Base ``Guard`` type
In fact, exposing the type at the C level is enough. There is no need to expose it at Python level, since the type has no method nor data, and it's not possible to use it in Python. We might expose it in a different module, again, maybe in _testcapi for unit tests.
I forgot "int na" and "int nk" parameters to support keywords arguments. Note for myself: I should add support for raising an exception.
* ``int first_check(PyObject *guard, PyObject *func)``: return 0 on success, -1 if the guard will always fail
Note for myself: I should rename the method to "init()" and support raising an exception.
Moreover, the PEP must be clear about func.__code__ content: func.specialize() must *not* modify func.__code__. It should be a completly black box. Victor

On 12 January 2016 at 08:44, Victor Stinner <victor.stinner@gmail.com> wrote:
Perhaps the specialisation call should also move to being a pure C API, only exposed through _testcapi for testing purposes? That would move both this and the dict versioning PEP into the same territory as the dynamic memory allocator PEP: low level C plumbing that enables interesting CPython specific extensions (like tracemalloc, in the dynamic allocator case) without committing other implementations to emulating features that aren't useful to them in any way. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi, 2016-01-12 1:47 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:
I really like your idea :-) It solves many issues and technically it's trivial to only add a C API and then expose it somewhere else at the Python level (for example in my "fat" module", or as you said in _testcapi for testing purpose). Instead of adding func.specialize() and func.get_specialized() at Python level, we can add *public* functions to the Python C API (excluded of the stable ABI): /* Add a specialized function with guards. Result: * - return 1 on success * - return 0 if the specialization has been ignored * - raise an exception and return -1 on error */ PyAPI_DATA(int) PyFunction_Specialize(PyObject *func, PyObject *func2, PyObject *guards); /* Get the list of specialized functions as a list of * (func, guards) where func is a callable or code object and guards * is a list of PyFuncGuard (or subtypes) objects. * Raise an exception and return NULL on error. */ PyAPI_FUNC(PyObject*) PyFunction_GetSpecialized(PyObject *func); /* Get the specialized function of a function. stack is a an array of PyObject* * objects: indexed arguments followed by (key, value) objects of keyword * arguments. na is the number of indexed arguments, nk is the number of * keyword arguments. stack contains na + nk * 2 objects. * * Return a callable or a code object on success. * Raise an exception and return NULL on error. */ PyAPI_FUNC(PyObject*) PyFunction_GetSpecializedFunc(PyObject *func, PyObject **stack, int na, int nk); Again, other Python implementations which don't want to implement function specializations can implement these functions as no-op (it's fine with the API): * PyFunction_Specialize() just returns 0 * PyFunction_GetSpecialized() creates an empty list * PyFunction_GetSpecializedFunc() returns the code object of the function (which is not something new) Or not implement these functions at all, since it doesn't make sense for them. -- First, I tried hard to avoid the need of a module to specialize functions. My first API added a specialize() method to functions which took a list of dictionaries to describe guards. The problem is this API is that it exposes the implementation details and it avoids to extend easily guard (implement new guards). Now the AST optimizer injects "import fat" to optimize code when needed. Hey, it's difficult to design a simple and obvious API! Victor

On Tue, 12 Jan 2016 at 01:59 Victor Stinner <victor.stinner@gmail.com> wrote:
This is somewhat similar to the JIT API we have been considering through our Pyjion work: * PyJit_Init() * PyJit_RegisterCodeObject() * PyJit_CompileCodeObject() If both ideas gain traction we may want to talk about whether there is some way to consolidate the APIs so we don't end up with a ton of different ways to optimize code objects.

Thank you for comments on the first version of the PEP 510. I changed it to only changes the C API, there is no more change on the Python API. I just posted the second version of the PEP to python-dev. Please move the discussion there. If you want to review others PEP on python-ideas, I'm going to post a first version of my AST transformer PEP (PEP 511), stay tuned :-D (yeah, I love working on 3 PEPs at the same time!) Victor
participants (5)
-
Brett Cannon
-
Chris Angelico
-
Nick Coghlan
-
Victor Stinner
-
Yury Selivanov