[Python-Dev] frame evaluation API PEP
Brett Cannon
brett at python.org
Mon Jun 20 15:52:23 EDT 2016
On Sun, 19 Jun 2016 at 21:01 Mark Shannon <mark at hotpy.org> wrote:
>
>
> On 19/06/16 18:29, Brett Cannon wrote:
> >
> >
> > On Sat, 18 Jun 2016 at 21:49 Guido van Rossum <guido at python.org
> > <mailto:guido at python.org>> wrote:
> >
> > Hi Brett,
> >
> > I've got a few questions about the specific design. Probably you
> > know the answers, it would be nice to have them in the PEP.
> >
> >
> > Once you're happy with my answers I'll update the PEP.
> >
> >
> > First, why not have a global hook? What does a hook per interpreter
> > give you? Would even finer granularity buy anything?
> >
> >
> > We initially considered a per-code object hook, but we figured it was
> > unnecessary to have that level of control, especially since people like
> > Numba have gotten away with not needing it for this long (although I
> > suspect that's because they are a decorator so they can just return an
> > object that overrides __call__()). We didn't think that a global one was
> > appropriate as different workloads may call for different
> > JITs/debuggers/etc. and there is no guarantee that you are executing
> > every interpreter with the same workload. Plus we figured people might
> > simply import their JIT of choice and as a side-effect set the hook, and
> > since imports are a per-interpreter thing that seemed to suggest the
> > granularity of interpreters.
> >
> > IOW it seemed to be more in line with sys.settrace() than some global
> > thing for the process.
> >
> >
> > Next, I'm a bit (but no more than a bit) concerned about the extra 8
> > bytes per code object, especially since for most people this is just
> > waste (assuming most people won't be using Pyjion or Numba). Could
> > it be a compile-time feature (requiring recompilation of CPython but
> > not extensions)?
> >
> >
> > Probably. It does water down potential usage thanks to needing a special
> > build. If the decision is "special build or not", I would simply pull
> > out this part of the proposal as I wouldn't want to add a flag that
> > influences what is or is not possible for an interpreter.
> >
> > Could you figure out some other way to store per-code-object data?
> > It seems you considered this but decided that the co_extra field was
> > simpler and faster; I'm basically pushing a little harder on this.
> > Of course most of the PEP would disappear without this feature; the
> > extra interpreter field is fine.
> >
> >
> > Dino and I thought of two potential alternatives, neither of which we
> > have taken the time to implement and benchmark. One is to simply have a
> > hash table of memory addresses to JIT data that is kept on the JIT side
> > of things. Obviously it would be nice to avoid the overhead of a hash
> > table lookup on every function call. This also doesn't help minimize
> > memory when the code object gets GC'ed.
>
> Hash lookups aren't that slow.
There's "slow" and there's "slower".
> If you combine it with the custom flags
> suggested by MRAB, then you would only suffer the lookup penalty when
> actually entering the special interpreter.
>
You actually will always need the lookup in the JIT case to increment the
execution count if you're not always immediately JIT-ing. That means MRAB's
flag won't necessarily be that useful in the JIT case (it could in the
debugging case, though, if you're really aiming for the fastest debugger
possible).
> You can use a weakref callback to ensure things get GC'd properly.
>
Yes, that was already the plan if we lost co_extra.
>
> Also, if there is a special extra field on code-object, then everyone
> will want to use it. How do you handle clashes?
>
As already explained in the PEP in
https://www.python.org/dev/peps/pep-0523/#expanding-pycodeobject, like
consenting adults. The expectation is that there will not be multiple users
of the object at the same time.
-Brett
>
> >
> > The other potential solution we came up with was to use weakrefs. I have
> > not looked into the details, but we were thinking that if we registered
> > the JIT data object as a weakref on the code object, couldn't we iterate
> > through the weakrefs attached to the code object to look for the JIT
> > data object, and then get the reference that way? It would let us avoid
> > a more expensive hash table lookup if we assume most code objects won't
> > have a weakref on it (assuming weakrefs are stored in a list), and it
> > gives us the proper cleanup semantics we want by getting the weakref
> > cleanup callback execution to make sure we decref the JIT data object
> > appropriately. But as I said, I have not looked into the feasibility of
> > this at all to know if I'm remembering the weakref implementation
> > details correctly.
> >
> >
> > Finally, there are some error messages from pep2html.py:
> > https://www.python.org/dev/peps/pep-0523/#copyright
> >
> >
> > All fixed in
> >
> https://github.com/python/peps/commit/6929f850a5af07e51d0163558a5fe8d6b85dccfe
> .
> >
> > -Brett
> >
> >
> >
> > --Guido
> >
> > On Fri, Jun 17, 2016 at 7:58 PM, Brett Cannon <brett at python.org
> > <mailto:brett at python.org>> wrote:
> >
> > I have taken PEP 523 for this:
> > https://github.com/python/peps/blob/master/pep-0523.txt .
> >
> > I'm waiting until Guido gets back from vacation, at which point
> > I'll ask for a pronouncement or assignment of a BDFL delegate.
> >
> > On Fri, 3 Jun 2016 at 14:37 Brett Cannon <brett at python.org
> > <mailto:brett at python.org>> wrote:
> >
> > For those of you who follow python-ideas or were at the
> > PyCon US 2016 language summit, you have already seen/heard
> > about this PEP. For those of you who don't fall into either
> > of those categories, this PEP proposed a frame evaluation
> > API for CPython. The motivating example of this work has
> > been Pyjion, the experimental CPython JIT Dino Viehland and
> > I have been working on in our spare time at Microsoft. The
> > API also works for debugging, though, as already
> > demonstrated by Google having added a very similar API
> > internally for debugging purposes.
> >
> > The PEP is pasted in below and also available in rendered
> > form at
> > https://github.com/Microsoft/Pyjion/blob/master/pep.rst (I
> > will assign myself a PEP # once discussion is finished as
> > it's easier to work in git for this for the rich rendering
> > of the in-progress PEP).
> >
> > I should mention that the difference from python-ideas and
> > the language summit in the PEP are the listed support from
> > Google's use of a very similar API as well as clarifying the
> > co_extra field on code objects doesn't change their
> > immutability (at least from the view of the PEP).
> >
> > ----------
> > PEP: NNN
> > Title: Adding a frame evaluation API to CPython
> > Version: $Revision$
> > Last-Modified: $Date$
> > Author: Brett Cannon <brett at python.org
> > <mailto:brett at python.org>>,
> > Dino Viehland <dinov at microsoft.com
> > <mailto:dinov at microsoft.com>>
> > Status: Draft
> > Type: Standards Track
> > Content-Type: text/x-rst
> > Created: 16-May-2016
> > Post-History: 16-May-2016
> > 03-Jun-2016
> >
> >
> > Abstract
> > ========
> >
> > This PEP proposes to expand CPython's C API [#c-api]_ to
> > allow for
> > the specification of a per-interpreter function pointer to
> > handle the
> > evaluation of frames [#pyeval_evalframeex]_. This proposal
> also
> > suggests adding a new field to code objects [#pycodeobject]_
> > to store
> > arbitrary data for use by the frame evaluation function.
> >
> >
> > Rationale
> > =========
> >
> > One place where flexibility has been lacking in Python is in
> > the direct
> > execution of Python code. While CPython's C API [#c-api]_
> > allows for
> > constructing the data going into a frame object and then
> > evaluating it
> > via ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_, control
> > over the
> > execution of Python code comes down to individual objects
> > instead of a
> > hollistic control of execution at the frame level.
> >
> > While wanting to have influence over frame evaluation may
> > seem a bit
> > too low-level, it does open the possibility for things such
> as a
> > method-level JIT to be introduced into CPython without
> > CPython itself
> > having to provide one. By allowing external C code to
> > control frame
> > evaluation, a JIT can participate in the execution of Python
> > code at
> > the key point where evaluation occurs. This then allows for
> > a JIT to
> > conditionally recompile Python bytecode to machine code as
> > desired
> > while still allowing for executing regular CPython bytecode
> when
> > running the JIT is not desired. This can be accomplished by
> > allowing
> > interpreters to specify what function to call to evaluate a
> > frame. And
> > by placing the API at the frame evaluation level it allows
> for a
> > complete view of the execution environment of the code for
> > the JIT.
> >
> > This ability to specify a frame evaluation function also
> > allows for
> > other use-cases beyond just opening CPython up to a JIT. For
> > instance,
> > it would not be difficult to implement a tracing or
> > profiling function
> > at the call level with this API. While CPython does provide
> the
> > ability to set a tracing or profiling function at the Python
> > level,
> > this would be able to match the data collection of the
> > profiler and
> > quite possibly be faster for tracing by simply skipping
> per-line
> > tracing support.
> >
> > It also opens up the possibility of debugging where the frame
> > evaluation function only performs special debugging work
> when it
> > detects it is about to execute a specific code object. In
> that
> > instance the bytecode could be theoretically rewritten
> > in-place to
> > inject a breakpoint function call at the proper point for
> > help in
> > debugging while not having to do a heavy-handed approach as
> > required by ``sys.settrace()``.
> >
> > To help facilitate these use-cases, we are also proposing
> > the adding
> > of a "scratch space" on code objects via a new field. This
> > will allow
> > per-code object data to be stored with the code object
> > itself for easy
> > retrieval by the frame evaluation function as necessary. The
> > field
> > itself will simply be a ``PyObject *`` type so that any data
> > stored in
> > the field will participate in normal object memory
> management.
> >
> >
> > Proposal
> > ========
> >
> > All proposed C API changes below will not be part of the
> > stable ABI.
> >
> >
> > Expanding ``PyCodeObject``
> > --------------------------
> >
> > One field is to be added to the ``PyCodeObject`` struct
> > [#pycodeobject]_::
> >
> > typedef struct {
> > ...
> > PyObject *co_extra; /* "Scratch space" for the code
> > object. */
> > } PyCodeObject;
> >
> > The ``co_extra`` will be ``NULL`` by default and will not be
> > used by
> > CPython itself. Third-party code is free to use the field as
> > desired.
> > Values stored in the field are expected to not be required
> > in order
> > for the code object to function, allowing the loss of the
> > data of the
> > field to be acceptable (this keeps the code object as
> > immutable from
> > a functionality point-of-view; this is slightly contentious
> > and so is
> > listed as an open issue in `Is co_extra needed?`_). The
> > field will be
> > freed like all other fields on ``PyCodeObject`` during
> > deallocation
> > using ``Py_XDECREF()``.
> >
> > It is not recommended that multiple users attempt to use the
> > ``co_extra`` simultaneously. While a dictionary could
> > theoretically be
> > set to the field and various users could use a key specific
> > to the
> > project, there is still the issue of key collisions as well
> as
> > performance degradation from using a dictionary lookup on
> > every frame
> > evaluation. Users are expected to do a type check to make
> > sure that
> > the field has not been previously set by someone else.
> >
> >
> > Expanding ``PyInterpreterState``
> > --------------------------------
> >
> > The entrypoint for the frame evalution function is
> > per-interpreter::
> >
> > // Same type signature as PyEval_EvalFrameEx().
> > typedef PyObject* (__stdcall
> > *PyFrameEvalFunction)(PyFrameObject*, int);
> >
> > typedef struct {
> > ...
> > PyFrameEvalFunction eval_frame;
> > } PyInterpreterState;
> >
> > By default, the ``eval_frame`` field will be initialized to
> > a function
> > pointer that represents what ``PyEval_EvalFrameEx()``
> > currently is
> > (called ``PyEval_EvalFrameDefault()``, discussed later in
> > this PEP).
> > Third-party code may then set their own frame evaluation
> > function
> > instead to control the execution of Python code. A pointer
> > comparison
> > can be used to detect if the field is set to
> > ``PyEval_EvalFrameDefault()`` and thus has not been mutated
> yet.
> >
> >
> > Changes to ``Python/ceval.c``
> > -----------------------------
> >
> > ``PyEval_EvalFrameEx()`` [#pyeval_evalframeex]_ as it
> > currently stands
> > will be renamed to ``PyEval_EvalFrameDefault()``. The new
> > ``PyEval_EvalFrameEx()`` will then become::
> >
> > PyObject *
> > PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag)
> > {
> > PyThreadState *tstate = PyThreadState_GET();
> > return tstate->interp->eval_frame(frame, throwflag);
> > }
> >
> > This allows third-party code to place themselves directly in
> > the path
> > of Python code execution while being backwards-compatible
> > with code
> > already using the pre-existing C API.
> >
> >
> > Updating ``python-gdb.py``
> > --------------------------
> >
> > The generated ``python-gdb.py`` file used for Python support
> > in GDB
> > makes some hard-coded assumptions about
> > ``PyEval_EvalFrameEx()``, e.g.
> > the names of local variables. It will need to be updated to
> > work with
> > the proposed changes.
> >
> >
> > Performance impact
> > ==================
> >
> > As this PEP is proposing an API to add pluggability,
> performance
> > impact is considered only in the case where no third-party
> > code has
> > made any changes.
> >
> > Several runs of pybench [#pybench]_ consistently showed no
> > performance
> > cost from the API change alone.
> >
> > A run of the Python benchmark suite [#py-benchmarks]_ showed
> no
> > measurable cost in performance.
> >
> > In terms of memory impact, since there are typically not
> > many CPython
> > interpreters executing in a single process that means the
> > impact of
> > ``co_extra`` being added to ``PyCodeObject`` is the only
> worry.
> > According to [#code-object-count]_, a run of the Python test
> > suite
> > results in about 72,395 code objects being created. On a
> 64-bit
> > CPU that would result in 579,160 bytes of extra memory being
> > used if
> > all code objects were alive at once and had nothing set in
> their
> > ``co_extra`` fields.
> >
> >
> > Example Usage
> > =============
> >
> > A JIT for CPython
> > -----------------
> >
> > Pyjion
> > ''''''
> >
> > The Pyjion project [#pyjion]_ has used this proposed API to
> > implement
> > a JIT for CPython using the CoreCLR's JIT [#coreclr]_. Each
> code
> > object has its ``co_extra`` field set to a
> > ``PyjionJittedCode`` object
> > which stores four pieces of information:
> >
> > 1. Execution count
> > 2. A boolean representing whether a previous attempt to JIT
> > failed
> > 3. A function pointer to a trampoline (which can be type
> > tracing or not)
> > 4. A void pointer to any JIT-compiled machine code
> >
> > The frame evaluation function has (roughly) the following
> > algorithm::
> >
> > def eval_frame(frame, throw_flag):
> > pyjion_code = frame.code.co_extra
> > if not pyjion_code:
> > frame.code.co_extra = PyjionJittedCode()
> > elif not pyjion_code.jit_failed:
> > if not pyjion_code.jit_code:
> > return
> > pyjion_code.eval(pyjion_code.jit_code, frame)
> > elif pyjion_code.exec_count > 20_000:
> > if jit_compile(frame):
> > return
> > pyjion_code.eval(pyjion_code.jit_code, frame)
> > else:
> > pyjion_code.jit_failed = True
> > pyjion_code.exec_count += 1
> > return PyEval_EvalFrameDefault(frame, throw_flag)
> >
> > The key point, though, is that all of this work and logic is
> > separate
> > from CPython and yet with the proposed API changes it is
> able to
> > provide a JIT that is compliant with Python semantics (as of
> > this
> > writing, performance is almost equivalent to CPython without
> > the new
> > API). This means there's nothing technically preventing
> > others from
> > implementing their own JITs for CPython by utilizing the
> > proposed API.
> >
> >
> > Other JITs
> > ''''''''''
> >
> > It should be mentioned that the Pyston team was consulted on
> an
> > earlier version of this PEP that was more JIT-specific and
> > they were
> > not interested in utilizing the changes proposed because
> > they want
> > control over memory layout they had no interest in directly
> > supporting
> > CPython itself. An informal discusion with a developer on
> > the PyPy
> > team led to a similar comment.
> >
> > Numba [#numba]_, on the other hand, suggested that they
> would be
> > interested in the proposed change in a post-1.0 future for
> > themselves [#numba-interest]_.
> >
> > The experimental Coconut JIT [#coconut]_ could have
> > benefitted from
> > this PEP. In private conversations with Coconut's creator we
> > were told
> > that our API was probably superior to the one they developed
> for
> > Coconut to add JIT support to CPython.
> >
> >
> > Debugging
> > ---------
> >
> > In conversations with the Python Tools for Visual Studio
> > team (PTVS)
> > [#ptvs]_, they thought they would find these API changes
> > useful for
> > implementing more performant debugging. As mentioned in the
> > Rationale_
> > section, this API would allow for switching on debugging
> > functionality
> > only in frames where it is needed. This could allow for
> either
> > skipping information that ``sys.settrace()`` normally
> > provides and
> > even go as far as to dynamically rewrite bytecode prior to
> > execution
> > to inject e.g. breakpoints in the bytecode.
> >
> > It also turns out that Google has provided a very similar API
> > internally for years. It has been used for performant
> debugging
> > purposes.
> >
> >
> > Implementation
> > ==============
> >
> > A set of patches implementing the proposed API is available
> > through
> > the Pyjion project [#pyjion]_. In its current form it has
> more
> > changes to CPython than just this proposed API, but that is
> > for ease
> > of development instead of strict requirements to accomplish
> > its goals.
> >
> >
> > Open Issues
> > ===========
> >
> > Allow ``eval_frame`` to be ``NULL``
> > -----------------------------------
> >
> > Currently the frame evaluation function is expected to
> > always be set.
> > It could very easily simply default to ``NULL`` instead
> > which would
> > signal to use ``PyEval_EvalFrameDefault()``. The current
> > proposal of
> > not special-casing the field seemed the most
> > straight-forward, but it
> > does require that the field not accidentally be cleared,
> > else a crash
> > may occur.
> >
> >
> > Is co_extra needed?
> > -------------------
> >
> > While discussing this PEP at PyCon US 2016, some core
> developers
> > expressed their worry of the ``co_extra`` field making code
> > objects
> > mutable. The thinking seemed to be that having a field that
> was
> > mutated after the creation of the code object made the
> > object seem
> > mutable, even though no other aspect of code objects changed.
> >
> > The view of this PEP is that the `co_extra` field doesn't
> > change the
> > fact that code objects are immutable. The field is specified
> > in this
> > PEP as to not contain information required to make the code
> > object
> > usable, making it more of a caching field. It could be
> viewed as
> > similar to the UTF-8 cache that string objects have
> internally;
> > strings are still considered immutable even though they have
> > a field
> > that is conditionally set.
> >
> > The field is also not strictly necessary. While the field
> > greatly
> > simplifies attaching extra information to code objects,
> > other options
> > such as keeping a mapping of code object memory addresses to
> > what
> > would have been kept in ``co_extra`` or perhaps using a weak
> > reference
> > of the data on the code object and then iterating through
> > the weak
> > references until the attached data is found is possible. But
> > obviously
> > all of these solutions are not as simple or performant as
> > adding the
> > ``co_extra`` field.
> >
> >
> > Rejected Ideas
> > ==============
> >
> > A JIT-specific C API
> > --------------------
> >
> > Originally this PEP was going to propose a much larger API
> > change
> > which was more JIT-specific. After soliciting feedback from
> > the Numba
> > team [#numba]_, though, it became clear that the API was
> > unnecessarily
> > large. The realization was made that all that was truly
> > needed was the
> > opportunity to provide a trampoline function to handle
> > execution of
> > Python code that had been JIT-compiled and a way to attach
> that
> > compiled machine code along with other critical data to the
> > corresponding Python code object. Once it was shown that
> > there was no
> > loss in functionality or in performance while minimizing the
> API
> > changes required, the proposal was changed to its current
> form.
> >
> >
> > References
> > ==========
> >
> > .. [#pyjion] Pyjion project
> > (https://github.com/microsoft/pyjion)
> >
> > .. [#c-api] CPython's C API
> > (https://docs.python.org/3/c-api/index.html)
> >
> > .. [#pycodeobject] ``PyCodeObject``
> > (
> https://docs.python.org/3/c-api/code.html#c.PyCodeObject)
> >
> > .. [#coreclr] .NET Core Runtime (CoreCLR)
> > (https://github.com/dotnet/coreclr)
> >
> > .. [#pyeval_evalframeex] ``PyEval_EvalFrameEx()``
> >
> > (
> https://docs.python.org/3/c-api/veryhigh.html?highlight=pyframeobject#c.PyEval_EvalFrameEx
> )
> >
> > .. [#pycodeobject] ``PyCodeObject``
> > (
> https://docs.python.org/3/c-api/code.html#c.PyCodeObject)
> >
> > .. [#numba] Numba
> > (http://numba.pydata.org/)
> >
> > .. [#numba-interest] numba-users mailing list:
> > "Would the C API for a JIT entrypoint being proposed by
> > Pyjion help out Numba?"
> >
> > (
> https://groups.google.com/a/continuum.io/forum/#!topic/numba-users/yRl_0t8-m1g
> )
> >
> > .. [#code-object-count] [Python-Dev] Opcode cache in ceval
> loop
> >
> > (
> https://mail.python.org/pipermail/python-dev/2016-February/143025.html)
> >
> > .. [#py-benchmarks] Python benchmark suite
> > (https://hg.python.org/benchmarks)
> >
> > .. [#pyston] Pyston
> > (http://pyston.org)
> >
> > .. [#pypy] PyPy
> > (http://pypy.org/)
> >
> > .. [#ptvs] Python Tools for Visual Studio
> > (http://microsoft.github.io/PTVS/)
> >
> > .. [#coconut] Coconut
> > (https://github.com/davidmalcolm/coconut)
> >
> >
> > Copyright
> > =========
> >
> > This document has been placed in the public domain.
> >
> >
> >
> > ..
> > Local Variables:
> > mode: indented-text
> > indent-tabs-mode: nil
> > sentence-end-double-space: t
> > fill-column: 70
> > coding: utf-8
> > End:
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org <mailto:Python-Dev at python.org>
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> >
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> >
> >
> >
> > --
> > --Guido van Rossum (python.org/~guido <http://python.org/~guido>)
> >
> >
> >
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/mark%40hotpy.org
> >
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160620/f1fbc785/attachment-0001.html>
More information about the Python-Dev
mailing list