[Python-Dev] frame evaluation API PEP

Mon Jun 20 15:52:23 EDT 2016

On Sun, 19 Jun 2016 at 21:01 Mark Shannon <mark at hotpy.org> wrote:

>
>
> On 19/06/16 18:29, Brett Cannon wrote:
> >
> >
> > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum <guido at python.org
> > <mailto:guido at python.org>> wrote:
> >
> >     Hi Brett,
> >
> >     I've got a few questions about the specific design. Probably you
> >     know the answers, it would be nice to have them in the PEP.
> >
> >
> > Once you're happy with my answers I'll update the PEP.
> >
> >
> >     First, why not have a global hook? What does a hook per interpreter
> >     give you? Would even finer granularity buy anything?
> >
> >
> > We initially considered a per-code object hook, but we figured it was
> > unnecessary to have that level of control, especially since people like
> > Numba have gotten away with not needing it for this long (although I
> > suspect that's because they are a decorator so they can just return an
> > object that overrides __call__()). We didn't think that a global one was
> > appropriate as different workloads may call for different
> > JITs/debuggers/etc. and there is no guarantee that you are executing
> > every interpreter with the same workload. Plus we figured people might
> > simply import their JIT of choice and as a side-effect set the hook, and
> > since imports are a per-interpreter thing that seemed to suggest the
> > granularity of interpreters.
> >
> > IOW it seemed to be more in line with sys.settrace() than some global
> > thing for the process.
> >
> >
> >     Next, I'm a bit (but no more than a bit) concerned about the extra 8
> >     bytes per code object, especially since for most people this is just
> >     waste (assuming most people won't be using Pyjion or Numba). Could
> >     it be a compile-time feature (requiring recompilation of CPython but
> >     not extensions)?
> >
> >
> > Probably. It does water down potential usage thanks to needing a special
> > build. If the decision is "special build or not", I would simply pull
> > out this part of the proposal as I wouldn't want to add a flag that
> > influences what is or is not possible for an interpreter.
> >
> >     Could you figure out some other way to store per-code-object data?
> >     It seems you considered this but decided that the co_extra field was
> >     simpler and faster; I'm basically pushing a little harder on this.
> >     Of course most of the PEP would disappear without this feature; the
> >     extra interpreter field is fine.
> >
> >
> > Dino and I thought of two potential alternatives, neither of which we
> > have taken the time to implement and benchmark. One is to simply have a
> > hash table of memory addresses to JIT data that is kept on the JIT side
> > of things. Obviously it would be nice to avoid the overhead of a hash
> > table lookup on every function call. This also doesn't help minimize
> > memory when the code object gets GC'ed.
>
> Hash lookups aren't that slow.

There's "slow" and there's "slower".

> If you combine it with the custom flags
> suggested by MRAB, then you would only suffer the lookup penalty when
> actually entering the special interpreter.
>

You actually will always need the lookup in the JIT case to increment the
execution count if you're not always immediately JIT-ing. That means MRAB's
flag won't necessarily be that useful in the JIT case (it could in the
debugging case, though, if you're really aiming for the fastest debugger
possible).

> You can use a weakref callback to ensure things get GC'd properly.
>

Yes, that was already the plan if we lost co_extra.

>
> Also, if there is a special extra field on code-object, then everyone
> will want to use it. How do you handle clashes?
>

As already explained in the PEP in
https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject, like
consenting adults. The expectation is that there will not be multiple users
of the object at the same time.

-Brett

>
> >
> > The other potential solution we came up with was to use weakrefs. I have
> > not looked into the details, but we were thinking that if we registered
> > the JIT data object as a weakref on the code object, couldn't we iterate
> > through the weakrefs attached to the code object to look for the JIT
> > data object, and then get the reference that way? It would let us avoid
> > a more expensive hash table lookup if we assume most code objects won't
> > have a weakref on it (assuming weakrefs are stored in a list), and it
> > gives us the proper cleanup semantics we want by getting the weakref
> > cleanup callback execution to make sure we decref the JIT data object
> > appropriately. But as I said, I have not looked into the feasibility of
> > this at all to know if I'm remembering the weakref implementation
> > details correctly.
> >
> >
> >     Finally, there are some error messages from pep2html.py:
> >     https://www.python.org/dev/peps/pep-0523/#copyright
> >
> >
> > All fixed in
> >
> https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe
> .
> >
> > -Brett
> >
> >
> >
> >     --Guido
> >
> >     On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon <brett at python.org
> >     <mailto:brett at python.org>> wrote:
> >
> >         I have taken PEP 523 for this:
> >         https://github.com/python/peps/blob/master/pep-0523.txt .
> >
> >         I'm waiting until Guido gets back from vacation, at which point
> >         I'll ask for a pronouncement or assignment of a BDFL delegate.
> >
> >         On Fri, 3 Jun 2016 at 14:37 Brett Cannon <brett at python.org
> >         <mailto:brett at python.org>> wrote:
> >
> >             For those of you who follow python-ideas or were at the
> >             PyCon US 2016 language summit, you have already seen/heard
> >             about this PEP. For those of you who don't fall into either
> >             of those categories, this PEP proposed a frame evaluation
> >             API for CPython. The motivating example of this work has
> >             been Pyjion, the experimental CPython JIT Dino Viehland and
> >             I have been working on in our spare time at Microsoft. The
> >             API also works for debugging, though, as already
> >             demonstrated by Google having added a very similar API
> >             internally for debugging purposes.
> >
> >             The PEP is pasted in below and also available in rendered
> >             form at
> >             https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I
> >             will assign myself a PEP # once discussion is finished as
> >             it's easier to work in git for this for the rich rendering
> >             of the in-progress PEP).
> >
> >             I should mention that the difference from python-ideas and
> >             the language summit in the PEP are the listed support from
> >             Google's use of a very similar API as well as clarifying the
> >             co_extra field on code objects doesn't change their
> >             immutability (at least from the view of the PEP).
> >
> >             ----------
> >             PEP: NNN
> >             Title: Adding a frame evaluation API to CPython
> >             Version: $Revision$
> >             Last-Modified: $Date$
> >             Author: Brett Cannon <brett at python.org
> >             <mailto:brett at python.org>>,
> >                      Dino Viehland <dinov at microsoft.com
> >             <mailto:dinov at microsoft.com>>
> >             Status: Draft
> >             Type: Standards Track
> >             Content-Type: text/x-rst
> >             Created: 16-May-2016
> >             Post-History: 16-May-2016
> >                            03-Jun-2016
> >
> >
> >             Abstract
> >             ========
> >
> >             This PEP proposes to expand CPython's C API [#c-api]_ to
> >             allow for
> >             the specification of a per-interpreter function pointer to
> >             handle the
> >             evaluation of frames [#pyeval_evalframeex]_. This proposal
> also
> >             suggests adding a new field to code objects [#pycodeobject]_
> >             to store
> >             arbitrary data for use by the frame evaluation function.
> >
> >
> >             Rationale
> >             =========
> >
> >             One place where flexibility has been lacking in Python is in
> >             the direct
> >             execution of Python code. While CPython's C API [#c-api]_
> >             allows for
> >             constructing the data going into a frame object and then
> >             evaluating it
> >             via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control
> >             over the
> >             execution of Python code comes down to individual objects
> >             instead of a
> >             hollistic control of execution at the frame level.
> >
> >             While wanting to have influence over frame evaluation may
> >             seem a bit
> >             too low-level, it does open the possibility for things such
> as a
> >             method-level JIT to be introduced into CPython without
> >             CPython itself
> >             having to provide one. By allowing external C code to
> >             control frame
> >             evaluation, a JIT can participate in the execution of Python
> >             code at
> >             the key point where evaluation occurs. This then allows for
> >             a JIT to
> >             conditionally recompile Python bytecode to machine code as
> >             desired
> >             while still allowing for executing regular CPython bytecode
> when
> >             running the JIT is not desired. This can be accomplished by
> >             allowing
> >             interpreters to specify what function to call to evaluate a
> >             frame. And
> >             by placing the API at the frame evaluation level it allows
> for a
> >             complete view of the execution environment of the code for
> >             the JIT.
> >
> >             This ability to specify a frame evaluation function also
> >             allows for
> >             other use-cases beyond just opening CPython up to a JIT. For
> >             instance,
> >             it would not be difficult to implement a tracing or
> >             profiling function
> >             at the call level with this API. While CPython does provide
> the
> >             ability to set a tracing or profiling function at the Python
> >             level,
> >             this would be able to match the data collection of the
> >             profiler and
> >             quite possibly be faster for tracing by simply skipping
> per-line
> >             tracing support.
> >
> >             It also opens up the possibility of debugging where the frame
> >             evaluation function only performs special debugging work
> when it
> >             detects it is about to execute a specific code object. In
> that
> >             instance the bytecode could be theoretically rewritten
> >             in-place to
> >             inject a breakpoint function call at the proper point for
> >             help in
> >             debugging while not having to do a heavy-handed approach as
> >             required by ``sys.settrace()``.
> >
> >             To help facilitate these use-cases, we are also proposing
> >             the adding
> >             of a "scratch space" on code objects via a new field. This
> >             will allow
> >             per-code object data to be stored with the code object
> >             itself for easy
> >             retrieval by the frame evaluation function as necessary. The
> >             field
> >             itself will simply be a ``PyObject *`` type so that any data
> >             stored in
> >             the field will participate in normal object memory
> management.
> >
> >
> >             Proposal
> >             ========
> >
> >             All proposed C API changes below will not be part of the
> >             stable ABI.
> >
> >
> >             Expanding ``PyCodeObject``
> >             --------------------------
> >
> >             One field is to be added to the ``PyCodeObject`` struct
> >             [#pycodeobject]_::
> >
> >                typedef struct {
> >                   ...
> >                   PyObject *co_extra;  /* "Scratch space" for the code
> >             object. */
> >                } PyCodeObject;
> >
> >             The ``co_extra`` will be ``NULL`` by default and will not be
> >             used by
> >             CPython itself. Third-party code is free to use the field as
> >             desired.
> >             Values stored in the field are expected to not be required
> >             in order
> >             for the code object to function, allowing the loss of the
> >             data of the
> >             field to be acceptable (this keeps the code object as
> >             immutable from
> >             a functionality point-of-view; this is slightly contentious
> >             and so is
> >             listed as an open issue in `Is co_extra needed?`_). The
> >             field will be
> >             freed like all other fields on ``PyCodeObject`` during
> >             deallocation
> >             using ``Py_XDECREF()``.
> >
> >             It is not recommended that multiple users attempt to use the
> >             ``co_extra`` simultaneously. While a dictionary could
> >             theoretically be
> >             set to the field and various users could use a key specific
> >             to the
> >             project, there is still the issue of key collisions as well
> as
> >             performance degradation from using a dictionary lookup on
> >             every frame
> >             evaluation. Users are expected to do a type check to make
> >             sure that
> >             the field has not been previously set by someone else.
> >
> >
> >             Expanding ``PyInterpreterState``
> >             --------------------------------
> >
> >             The entrypoint for the frame evalution function is
> >             per-interpreter::
> >
> >                // Same type signature as PyEval_EvalFrameEx().
> >                typedef PyObject* (__stdcall
> >             *PyFrameEvalFunction)(PyFrameObject*, int);
> >
> >                typedef struct {
> >                    ...
> >                    PyFrameEvalFunction eval_frame;
> >                } PyInterpreterState;
> >
> >             By default, the ``eval_frame`` field will be initialized to
> >             a function
> >             pointer that represents what ``PyEval_EvalFrameEx()``
> >             currently is
> >             (called ``PyEval_EvalFrameDefault()``, discussed later in
> >             this PEP).
> >             Third-party code may then set their own frame evaluation
> >             function
> >             instead to control the execution of Python code. A pointer
> >             comparison
> >             can be used to detect if the field is set to
> >             ``PyEval_EvalFrameDefault()`` and thus has not been mutated
> yet.
> >
> >
> >             Changes to ``Python/ceval.c``
> >             -----------------------------
> >
> >             ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it
> >             currently stands
> >             will be renamed to ``PyEval_EvalFrameDefault()``. The new
> >             ``PyEval_EvalFrameEx()`` will then become::
> >
> >                  PyObject *
> >                  PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag)
> >                  {
> >                      PyThreadState *tstate = PyThreadState_GET();
> >                      return tstate->interp->eval_frame(frame, throwflag);
> >                  }
> >
> >             This allows third-party code to place themselves directly in
> >             the path
> >             of Python code execution while being backwards-compatible
> >             with code
> >             already using the pre-existing C API.
> >
> >
> >             Updating ``python-gdb.py``
> >             --------------------------
> >
> >             The generated ``python-gdb.py`` file used for Python support
> >             in GDB
> >             makes some hard-coded assumptions about
> >             ``PyEval_EvalFrameEx()``, e.g.
> >             the names of local variables. It will need to be updated to
> >             work with
> >             the proposed changes.
> >
> >
> >             Performance impact
> >             ==================
> >
> >             As this PEP is proposing an API to add pluggability,
> performance
> >             impact is considered only in the case where no third-party
> >             code has
> >             made any changes.
> >
> >             Several runs of pybench [#pybench]_ consistently showed no
> >             performance
> >             cost from the API change alone.
> >
> >             A run of the Python benchmark suite [#py-benchmarks]_ showed
> no
> >             measurable cost in performance.
> >
> >             In terms of memory impact, since there are typically not
> >             many CPython
> >             interpreters executing in a single process that means the
> >             impact of
> >             ``co_extra`` being added to ``PyCodeObject`` is the only
> worry.
> >             According to [#code-object-count]_, a run of the Python test
> >             suite
> >             results in about 72,395 code objects being created. On a
> 64-bit
> >             CPU that would result in 579,160 bytes of extra memory being
> >             used if
> >             all code objects were alive at once and had nothing set in
> their
> >             ``co_extra`` fields.
> >
> >
> >             Example Usage
> >             =============
> >
> >             A JIT for CPython
> >             -----------------
> >
> >             Pyjion
> >             ''''''
> >
> >             The Pyjion project [#pyjion]_ has used this proposed API to
> >             implement
> >             a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each
> code
> >             object has its ``co_extra`` field set to a
> >             ``PyjionJittedCode`` object
> >             which stores four pieces of information:
> >
> >             1. Execution count
> >             2. A boolean representing whether a previous attempt to JIT
> >             failed
> >             3. A function pointer to a trampoline (which can be type
> >             tracing or not)
> >             4. A void pointer to any JIT-compiled machine code
> >
> >             The frame evaluation function has (roughly) the following
> >             algorithm::
> >
> >                  def eval_frame(frame, throw_flag):
> >                      pyjion_code = frame.code.co_extra
> >                      if not pyjion_code:
> >                          frame.code.co_extra = PyjionJittedCode()
> >                      elif not pyjion_code.jit_failed:
> >                          if not pyjion_code.jit_code:
> >                              return
> >             pyjion_code.eval(pyjion_code.jit_code, frame)
> >                          elif pyjion_code.exec_count > 20_000:
> >                              if jit_compile(frame):
> >                                  return
> >             pyjion_code.eval(pyjion_code.jit_code, frame)
> >                              else:
> >                                  pyjion_code.jit_failed = True
> >                      pyjion_code.exec_count += 1
> >                      return PyEval_EvalFrameDefault(frame, throw_flag)
> >
> >             The key point, though, is that all of this work and logic is
> >             separate
> >             from CPython and yet with the proposed API changes it is
> able to
> >             provide a JIT that is compliant with Python semantics (as of
> >             this
> >             writing, performance is almost equivalent to CPython without
> >             the new
> >             API). This means there's nothing technically preventing
> >             others from
> >             implementing their own JITs for CPython by utilizing the
> >             proposed API.
> >
> >
> >             Other JITs
> >             ''''''''''
> >
> >             It should be mentioned that the Pyston team was consulted on
> an
> >             earlier version of this PEP that was more JIT-specific and
> >             they were
> >             not interested in utilizing the changes proposed because
> >             they want
> >             control over memory layout they had no interest in directly
> >             supporting
> >             CPython itself. An informal discusion with a developer on
> >             the PyPy
> >             team led to a similar comment.
> >
> >             Numba [#numba]_, on the other hand, suggested that they
> would be
> >             interested in the proposed change in a post-1.0 future for
> >             themselves [#numba-interest]_.
> >
> >             The experimental Coconut JIT [#coconut]_ could have
> >             benefitted from
> >             this PEP. In private conversations with Coconut's creator we
> >             were told
> >             that our API was probably superior to the one they developed
> for
> >             Coconut to add JIT support to CPython.
> >
> >
> >             Debugging
> >             ---------
> >
> >             In conversations with the Python Tools for Visual Studio
> >             team (PTVS)
> >             [#ptvs]_, they thought they would find these API changes
> >             useful for
> >             implementing more performant debugging. As mentioned in the
> >             Rationale_
> >             section, this API would allow for switching on debugging
> >             functionality
> >             only in frames where it is needed. This could allow for
> either
> >             skipping information that ``sys.settrace()`` normally
> >             provides and
> >             even go as far as to dynamically rewrite bytecode prior to
> >             execution
> >             to inject e.g. breakpoints in the bytecode.
> >
> >             It also turns out that Google has provided a very similar API
> >             internally for years. It has been used for performant
> debugging
> >             purposes.
> >
> >
> >             Implementation
> >             ==============
> >
> >             A set of patches implementing the proposed API is available
> >             through
> >             the Pyjion project [#pyjion]_. In its current form it has
> more
> >             changes to CPython than just this proposed API, but that is
> >             for ease
> >             of development instead of strict requirements to accomplish
> >             its goals.
> >
> >
> >             Open Issues
> >             ===========
> >
> >             Allow ``eval_frame`` to be ``NULL``
> >             -----------------------------------
> >
> >             Currently the frame evaluation function is expected to
> >             always be set.
> >             It could very easily simply default to ``NULL`` instead
> >             which would
> >             signal to use ``PyEval_EvalFrameDefault()``. The current
> >             proposal of
> >             not special-casing the field seemed the most
> >             straight-forward, but it
> >             does require that the field not accidentally be cleared,
> >             else a crash
> >             may occur.
> >
> >
> >             Is co_extra needed?
> >             -------------------
> >
> >             While discussing this PEP at PyCon US 2016, some core
> developers
> >             expressed their worry of the ``co_extra`` field making code
> >             objects
> >             mutable. The thinking seemed to be that having a field that
> was
> >             mutated after the creation of the code object made the
> >             object seem
> >             mutable, even though no other aspect of code objects changed.
> >
> >             The view of this PEP is that the `co_extra` field doesn't
> >             change the
> >             fact that code objects are immutable. The field is specified
> >             in this
> >             PEP as to not contain information required to make the code
> >             object
> >             usable, making it more of a caching field. It could be
> viewed as
> >             similar to the UTF-8 cache that string objects have
> internally;
> >             strings are still considered immutable even though they have
> >             a field
> >             that is conditionally set.
> >
> >             The field is also not strictly necessary. While the field
> >             greatly
> >             simplifies attaching extra information to code objects,
> >             other options
> >             such as keeping a mapping of code object memory addresses to
> >             what
> >             would have been kept in ``co_extra`` or perhaps using a weak
> >             reference
> >             of the data on the code object and then iterating through
> >             the weak
> >             references until the attached data is found is possible. But
> >             obviously
> >             all of these solutions are not as simple or performant as
> >             adding the
> >             ``co_extra`` field.
> >
> >
> >             Rejected Ideas
> >             ==============
> >
> >             A JIT-specific C API
> >             --------------------
> >
> >             Originally this PEP was going to propose a much larger API
> >             change
> >             which was more JIT-specific. After soliciting feedback from
> >             the Numba
> >             team [#numba]_, though, it became clear that the API was
> >             unnecessarily
> >             large. The realization was made that all that was truly
> >             needed was the
> >             opportunity to provide a trampoline function to handle
> >             execution of
> >             Python code that had been JIT-compiled and a way to attach
> that
> >             compiled machine code along with other critical data to the
> >             corresponding Python code object. Once it was shown that
> >             there was no
> >             loss in functionality or in performance while minimizing the
> API
> >             changes required, the proposal was changed to its current
> form.
> >
> >
> >             References
> >             ==========
> >
> >             .. [#pyjion] Pyjion project
> >                 (https://github.com/microsoft/pyjion)
> >
> >             .. [#c-api] CPython's C API
> >                 (https://docs.python.org/3/c-api/index.html)
> >
> >             .. [#pycodeobject] ``PyCodeObject``
> >                 (
> https://docs.python.org/3/c-api/code.html#c.PyCodeObject)
> >
> >             .. [#coreclr] .NET Core Runtime (CoreCLR)
> >                 (https://github.com/dotnet/coreclr)
> >
> >             .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()``
> >
> >               (
> https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx
> )
> >
> >             .. [#pycodeobject] ``PyCodeObject``
> >                 (
> https://docs.python.org/3/c-api/code.html#c.PyCodeObject)
> >
> >             .. [#numba] Numba
> >                 (http://numba.pydata.org/)
> >
> >             .. [#numba-interest]  numba-users mailing list:
> >                 "Would the C API for a JIT entrypoint being proposed by
> >             Pyjion help out Numba?"
> >
> >               (
> https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g
> )
> >
> >             .. [#code-object-count] [Python-Dev] Opcode cache in ceval
> loop
> >
> >               (
> https://mail.python.org/pipermail/python-dev/2016-February/143025.html)
> >
> >             .. [#py-benchmarks] Python benchmark suite
> >                 (https://hg.python.org/benchmarks)
> >
> >             .. [#pyston] Pyston
> >                 (http://pyston.org)
> >
> >             .. [#pypy] PyPy
> >                 (http://pypy.org/)
> >
> >             .. [#ptvs] Python Tools for Visual Studio
> >                 (http://microsoft.github.io/PTVS/)
> >
> >             .. [#coconut] Coconut
> >                 (https://github.com/davidmalcolm/coconut)
> >
> >
> >             Copyright
> >             =========
> >
> >             This document has been placed in the public domain.
> >
> >
> >
> >             ..
> >                 Local Variables:
> >                 mode: indented-text
> >                 indent-tabs-mode: nil
> >                 sentence-end-double-space: t
> >                 fill-column: 70
> >                 coding: utf-8
> >                 End:
> >
> >
> >         _______________________________________________
> >         Python-Dev mailing list
> >         Python-Dev at python.org <mailto:Python-Dev at python.org>
> >         https://mail.python.org/mailman/listinfo/python-dev
> >         Unsubscribe:
> >
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> >
> >
> >
> >     --
> >     --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
> >
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160620/f1fbc785/attachment-0001.html>