[Python-Dev] PEP 558: Defined semantics for locals()

Guido van Rossum guido at python.org
Sat May 25 10:36:45 EDT 2019


This looks great.

I only have two nits with the text.

First, why is the snapshot called a "dynamic snapshot"? What exactly is
dynamic about it?

Second, you use the word "mapping" a lot. Would you mind changing that to
"mapping object" in most places? Especially in the phrase "each call to
``locals()`` returns the *same* mapping". To me, without the word "object"
added, this *could* be interpreted as "a dict with the same key/value
pairs" (since "mapping" is also an abstract mathematical concept describing
anything that maps keys to values).

Other than that, go for it! (Assuming Nathaniel agrees, of course.)

--Guido

On Tue, May 21, 2019 at 7:54 AM Nick Coghlan <ncoghlan at gmail.com> wrote:

> Hi folks,
>
> After a couple of years hiatus, I finally got back to working on the
> reference implementation for PEP 558, my proposal to resolve some
> weird interactions between closures and trace functions in CPython by
> formally defining the expected semantics of the locals() builtin at
> function scope, and then ensuring that CPython adheres to those
> clarified semantics.
>
> The full text of the PEP is included below, and the rendered version
> is available at https://www.python.org/dev/peps/pep-0558/
>
> The gist of the PEP is that:
>
> 1. The behaviour when no trace functions are installed and no frame
> introspection APIs are invoked is fine, so nothing should change in
> that regard (except that we elevate that behaviour from "the way
> CPython happens to work" to "the way the language and library
> reference says that Python implementations are supposed to work", and
> make sure CPython continues to behave that way even when a trace hook
> *is* installed)
> 2. If you get hold of a frame object in CPython (or another
> implementation that emulates the CPython frame API), whether via a
> trace hook or via a frame introspection API, then writing to the
> returned mapping will update the actual function locals and/or closure
> reference immediately, rather than relying on the
> FastToLocals/LocalsToFast APIs
> 3. The LocalsToFast C API changes to always produce RuntimeError
> (since there's no way for us to make it actually work correctly and
> consistently in the presence of closures, and the trace hook
> implementation won't need it any more given the write-through proxy
> behaviour on frame objects' "f_locals" attribute)
>
> The reference implementation still isn't quite done yet, but it's far
> enough along that I'm happy with the semantics and C API updates
> proposed in the current version of the PEP.
>
> Cheers,
> Nick.
>
> P.S. I'm away this weekend, so I expect the reference implementation
> to be done late next week, and I'd be submitting the PEP to Nathaniel
> for formal pronouncement at that point. However, I'm posting this
> thread now so that there's more time for discussion prior to the 3.8b1
> deadline.
>
> ==============
> PEP: 558
> Title: Defined semantics for locals()
> Author: Nick Coghlan <ncoghlan at gmail.com>
> BDFL-Delegate: Nathaniel J. Smith
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2017-09-08
> Python-Version: 3.8
> Post-History: 2017-09-08, 2019-05-22
>
>
> Abstract
> ========
>
> The semantics of the ``locals()`` builtin have historically been
> underspecified
> and hence implementation dependent.
>
> This PEP proposes formally standardising on the behaviour of the CPython
> 3.6
> reference implementation for most execution scopes, with some adjustments
> to the
> behaviour at function scope to make it more predictable and independent of
> the
> presence or absence of tracing functions.
>
>
> Rationale
> =========
>
> While the precise semantics of the ``locals()`` builtin are nominally
> undefined,
> in practice, many Python programs depend on it behaving exactly as it
> behaves in
> CPython (at least when no tracing functions are installed).
>
> Other implementations such as PyPy are currently replicating that
> behaviour,
> up to and including replication of local variable mutation bugs that
> can arise when a trace hook is installed [1]_.
>
> While this PEP considers CPython's current behaviour when no trace hooks
> are
> installed to be acceptable (and largely desirable), it considers the
> current
> behaviour when trace hooks are installed to be problematic, as it causes
> bugs
> like [1]_ *without* even reliably enabling the desired functionality of
> allowing
> debuggers like ``pdb`` to mutate local variables [3]_.
>
>
> Proposal
> ========
>
> The expected semantics of the ``locals()`` builtin change based on the
> current
> execution scope. For this purpose, the defined scopes of execution are:
>
> * module scope: top-level module code, as well as any other code executed
> using
>   ``exec()`` or ``eval()`` with a single namespace
> * class scope: code in the body of a ``class`` statement, as well as any
> other
>   code executed using ``exec()`` or ``eval()`` with separate local and
> global
>   namespaces
> * function scope: code in the body of a ``def`` or ``async def`` statement
>
> We also allow interpreters to define two "modes" of execution, with only
> the
> first mode being considered part of the language specification itself:
>
> * regular operation: the way the interpreter behaves by default
> * tracing mode: the way the interpreter behaves when a trace hook has been
>   registered in one or more threads via an implementation dependent
> mechanism
>   like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or
>   ``PyEval_SetTrace`` ([5]_) in CPython's C API
>
> For regular operation, this PEP proposes elevating the current behaviour of
> the CPython reference implementation to become part of the language
> specification.
>
> For tracing mode, this PEP proposes changes to CPython's behaviour at
> function
> scope that bring the ``locals()`` builtin semantics closer to those used in
> regular operation, while also making the related frame API semantics
> clearer
> and easier for interactive debuggers to rely on.
>
> The proposed tracing mode changes also affect the semantics of frame object
> references obtained through other means, such as via a traceback, or via
> the
> ``sys._getframe()`` API.
>
>
> New ``locals()`` documentation
> ------------------------------
>
> The heart of this proposal is to revise the documentation for the
> ``locals()``
> builtin to read as follows:
>
>     Return a dictionary representing the current local symbol table, with
>     variable names as the keys, and their currently bound references as the
>     values. This will always be the same dictionary for a given runtime
>     execution frame.
>
>     At module scope, as well as when using ``exec()`` or ``eval()`` with a
>     single namespace, this function returns the same namespace as
> ``globals()``.
>
>     At class scope, it returns the namespace that will be passed to the
>     metaclass constructor.
>
>     When using ``exec()`` or ``eval()`` with separate local and global
>     namespaces, it returns the local namespace passed in to the function
> call.
>
>     At function scope (including for generators and coroutines), it
> returns a
>     dynamic snapshot of the function's local variables and any nonlocal
> cell
>     references. In this case, changes made via the snapshot are *not*
> written
>     back to the corresponding local variables or nonlocal cell references,
> and
>     any such changes to the snapshot will be overwritten if the snapshot is
>     subsequently refreshed (e.g. by another call to ``locals()``).
>
>     CPython implementation detail: the dynamic snapshot for the current
> frame
>     will be implicitly refreshed before each call to the trace function
> when a
>     trace function is active.
>
> For reference, the current documentation of this builtin reads as follows:
>
>     Update and return a dictionary representing the current local symbol
> table.
>     Free variables are returned by locals() when it is called in function
>     blocks, but not in class blocks.
>
>     Note: The contents of this dictionary should not be modified; changes
> may
>     not affect the values of local and free variables used by the
> interpreter.
>
> (In other words: the status quo is that the semantics and behaviour of
> ``locals()`` are currently formally implementation defined, whereas the
> proposed
> state after this PEP is that the only implementation defined behaviour
> will be
> that encountered at function scope when a tracing function is defined,
> with the
> behaviour in all other cases being defined by the language and library
> references)
>
>
> Module scope
> ------------
>
> At module scope, as well as when using ``exec()`` or ``eval()`` with a
> single namespace, ``locals()`` must return the same object as
> ``globals()``,
> which must be the actual execution namespace (available as
> ``inspect.currentframe().f_locals`` in implementations that provide access
> to frame objects).
>
> Variable assignments during subsequent code execution in the same scope
> must
> dynamically change the contents of the returned mapping, and changes to the
> returned mapping must change the values bound to local variable names in
> the
> execution environment.
>
> The semantics at module scope are required to be the same in both tracing
> mode (if provided by the implementation) and in regular operation.
>
> To capture this expectation as part of the language specification, the
> following
> paragraph will be added to the documentation for ``locals()``:
>
>    At module scope, as well as when using ``exec()`` or ``eval()`` with a
>    single namespace, this function returns the same namespace as
> ``globals()``.
>
> This part of the proposal does not require any changes to the reference
> implementation - it is standardisation of the current behaviour.
>
>
> Class scope
> -----------
>
> At class scope, as well as when using ``exec()`` or ``eval()`` with
> separate
> global and local namespaces, ``locals()`` must return the specified local
> namespace (which may be supplied by the metaclass ``__prepare__`` method
> in the case of classes). As for module scope, this must be a direct
> reference
> to the actual execution namespace (available as
> ``inspect.currentframe().f_locals`` in implementations that provide access
> to frame objects).
>
> Variable assignments during subsequent code execution in the same scope
> must
> change the contents of the returned mapping, and changes to the returned
> mapping
> must change the values bound to local variable names in the
> execution environment.
>
> The mapping returned by ``locals()`` will *not* be used as the actual class
> namespace underlying the defined class (the class creation process will
> copy
> the contents to a fresh dictionary that is only accessible by going
> through the
> class machinery).
>
> For nested classes defined inside a function, any nonlocal cells
> referenced from
> the class scope are *not* included in the ``locals()`` mapping.
>
> The semantics at class scope are required to be the same in both tracing
> mode (if provided by the implementation) and in regular operation.
>
> To capture this expectation as part of the language specification, the
> following
> two paragraphs will be added to the documentation for ``locals()``:
>
>    When using ``exec()`` or ``eval()`` with separate local and global
>    namespaces, [this function] returns the given local namespace.
>
>    At class scope, it returns the namespace that will be passed to the
> metaclass
>    constructor.
>
> This part of the proposal does not require any changes to the reference
> implementation - it is standardisation of the current behaviour.
>
>
> Function scope
> --------------
>
> At function scope, interpreter implementations are granted significant
> freedom
> to optimise local variable access, and hence are NOT required to permit
> arbitrary modification of local and nonlocal variable bindings through the
> mapping returned from ``locals()``.
>
> Historically, this leniency has been described in the language
> specification
> with the words "The contents of this dictionary should not be modified;
> changes
> may not affect the values of local and free variables used by the
> interpreter."
>
> This PEP proposes to change that text to instead say:
>
>     At function scope (including for generators and coroutines), [this
> function]
>     returns a dynamic snapshot of the function's local variables and any
>     nonlocal cell references. In this case, changes made via the snapshot
> are
>     *not* written back to the corresponding local variables or nonlocal
> cell
>     references, and any such changes to the snapshot will be overwritten
> if the
>     snapshot is subsequently refreshed (e.g. by another call to
> ``locals()``).
>
>     CPython implementation detail: the dynamic snapshot for the currently
>     executing frame will be implicitly refreshed before each call to the
> trace
>     function when a trace function is active.
>
> This part of the proposal *does* require changes to the CPython reference
> implementation, as while it accurately describes the behaviour in regular
> operation, the "write back" strategy currently used to support namespace
> changes
> from trace functions doesn't comply with it (and also causes the quirky
> behavioural problems mentioned in the Rationale).
>
>
> CPython Implementation Changes
> ==============================
>
> The current cause of CPython's tracing mode quirks (both the side effects
> from
> simply installing a tracing function and the fact that writing values back
> to
> function locals only works for the specific function being traced) is the
> way
> that locals mutation support for trace hooks is currently implemented: the
> ``PyFrame_LocalsToFast`` function.
>
> When a trace function is installed, CPython currently does the following
> for
> function frames (those where the code object uses "fast locals" semantics):
>
> 1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot
> 2. Calls the trace hook (with tracing of the hook itself disabled)
> 3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the
> dynamic
>    snapshot
>
> This approach is problematic for a few different reasons:
>
> * Even if the trace function doesn't mutate the snapshot, the final step
> resets
>   any cell references back to the state they were in before the trace
> function
>   was called (this is the root cause of the bug report in [1]_)
> * If the trace function *does* mutate the snapshot, but then does something
>   that causes the snapshot to be refreshed, those changes are lost (this is
>   one aspect of the bug report in [3]_)
> * If the trace function attempts to mutate the local variables of a frame
> other
>   than the one being traced (e.g. ``frame.f_back.f_locals``), those changes
>   will almost certainly be lost (this is another aspect of the bug report
> in
>   [3]_)
> * If a ``locals()`` reference is passed to another function, and *that*
>   function mutates the snapshot namespace, then those changes *may* be
> written
>   back to the execution frame *if* a trace hook is installed
>
> The proposed resolution to this problem is to take advantage of the fact
> that
> whereas functions typically access their *own* namespace using the language
> defined ``locals()`` builtin, trace functions necessarily use the
> implementation
> dependent ``frame.f_locals`` interface, as a frame reference is what gets
> passed to hook implementations.
>
> Instead of being a direct reference to the dynamic snapshot returned by
> ``locals()``, ``frame.f_locals`` will be updated to instead return a
> dedicated
> proxy type (implemented as a private subclass of the existing
> ``types.MappingProxyType``) that has two internal attributes not exposed as
> part of either the Python or public C API:
>
> * *mapping*: the dynamic snapshot that is returned by the ``locals()``
> builtin
> * *frame*: the underlying frame that the snapshot is for
>
> ``__setitem__`` and ``__delitem__`` operations on the proxy will affect
> not only
> the dynamic snapshot, but *also* the corresponding fast local or cell
> reference
> on the underlying frame.
>
> The ``locals()`` builtin will be made aware of this proxy type, and
> continue to
> return a reference to the dynamic snapshot rather than to the write-through
> proxy.
>
> At the C API layer, ``PyEval_GetLocals()`` will implement the same
> semantics
> as the Python level ``locals()`` builtin, and a new
> ``PyFrame_GetPyLocals(frame)`` accessor API will be provided to allow the
> function level proxy bypass logic to be encapsulated entirely inside the
> frame
> implementation.
>
> The C level equivalent of accessing ``pyframe.f_locals`` in Python will be
> a
> new ``PyFrame_GetLocalsAttr(frame)`` API. Like the Python level
> descriptor, the
> new API will implicitly refresh the dynamic snapshot at function scope
> before
> returning a reference to the write-through proxy.
>
> The ``PyFrame_LocalsToFast()`` function will be changed to always emit
> ``RuntimeError``, explaining that it is no longer a supported operation,
> and
> affected code should be updated to rely on the write-through tracing mode
> proxy instead.
>
>
> Design Discussion
> =================
>
> Ensuring ``locals()`` returns a shared snapshot at function scope
> -----------------------------------------------------------------
>
> The ``locals()`` builtin is a required part of the language, and in the
> reference implementation it has historically returned a mutable mapping
> with
> the following characteristics:
>
> * each call to ``locals()`` returns the *same* mapping
> * for namespaces where ``locals()`` returns a reference to something other
> than
>   the actual local execution namespace, each call to ``locals()`` updates
> the
>   mapping with the current state of the local variables and any referenced
>   nonlocal cells
> * changes to the returned mapping *usually* aren't written back to the
>   local variable bindings or the nonlocal cell references, but write backs
>   can be triggered by doing one of the following:
>
>   * installing a Python level trace hook (write backs then happen whenever
>     the trace hook is called)
>   * running a function level wildcard import (requires bytecode
> injection in Py3)
>   * running an ``exec`` statement in the function's scope (Py2 only, since
>     ``exec`` became an ordinary builtin in Python 3)
>
> The proposal in this PEP aims to retain the first two properties (to
> maintain
> backwards compatibility with as much code as possible) while ensuring that
> simply installing a trace hook can't enable rebinding of function locals
> via
> the ``locals()`` builtin (whereas enabling rebinding via
> ``frame.f_locals`` inside the tracehook implementation is fully intended).
>
>
> Keeping ``locals()`` as a dynamic snapshot at function scope
> ------------------------------------------------------------
>
> It would theoretically be possible to change the semantics of the
> ``locals()``
> builtin to return the write-through proxy at function scope, rather than
> continuing to return a dynamic snapshot.
>
> This PEP doesn't (and won't) propose this as it's a backwards incompatible
> change in practice, even though code that relies on the current behaviour
> is
> technically operating in an undefined area of the language specification.
>
> Consider the following code snippet::
>
>     def example():
>         x = 1
>         locals()["x"] = 2
>         print(x)
>
> Even with a trace hook installed, that function will consistently print
> ``1``
> on the current reference interpreter implementation::
>
>     >>> example()
>     1
>     >>> import sys
>     >>> def basic_hook(*args):
>     ...     return basic_hook
>     ...
>     >>> sys.settrace(basic_hook)
>     >>> example()
>     1
>
> Similarly, ``locals()`` can be passed to the ``exec()`` and ``eval()``
> builtins
> at function scope without risking unexpected rebinding of local variables.
>
> Provoking the reference interpreter into incorrectly mutating the local
> variable
> state requires a more complex setup where a nested function closes over a
> variable being rebound in the outer function, and due to the use of either
> threads, generators, or coroutines, it's possible for a trace function to
> start
> running for the nested function before the rebinding operation in the outer
> function, but finish running after the rebinding operation has taken place
> (in
> which case the rebinding will be reverted, which is the bug reported in
> [1]_).
>
> In addition to preserving the de facto semantics which have been in place
> since
> PEP 227 introduced nested scopes in Python 2.1, the other benefit of
> restricting
> the write-through proxy support to the implementation-defined frame object
> API
> is that it means that only interpreter implementations which emulate the
> full
> frame API need to offer the write-through capability at all, and that
> JIT-compiled implementations only need to enable it when a frame
> introspection
> API is invoked, or a trace hook is installed, not whenever ``locals()`` is
> accessed at function scope.
>
>
> What happens with the default args for ``eval()`` and ``exec()``?
> -----------------------------------------------------------------
>
> These are formally defined as inheriting ``globals()`` and ``locals()``
> from
> the calling scope by default.
>
> There isn't any need for the PEP to change these defaults, so it doesn't.
>
>
> Changing the frame API semantics in regular operation
> -----------------------------------------------------
>
> Earlier versions of this PEP proposed having the semantics of the frame
> ``f_locals`` attribute depend on whether or not a tracing hook was
> currently
> installed - only providing the write-through proxy behaviour when a
> tracing hook
> was active, and otherwise behaving the same as the ``locals()`` builtin.
>
> That was adopted as the original design proposal for a couple of key
> reasons,
> one pragmatic and one more philosophical:
>
> * Object allocations and method wrappers aren't free, and tracing functions
>   aren't the only operations that access frame locals from outside the
> function.
>   Restricting the changes to tracing mode meant that the additional memory
> and
>   execution time overhead of these changes would as close to zero in
> regular
>   operation as we can possibly make them.
> * "Don't change what isn't broken": the current tracing mode problems are
> caused
>   by a requirement that's specific to tracing mode (support for external
>   rebinding of function local variable references), so it made sense to
> also
>   restrict any related fixes to tracing mode
>
> However, actually attempting to implement and document that dynamic
> approach
> highlighted the fact that it makes for a really subtle runtime state
> dependent
> behaviour distinction in how ``frame.f_locals`` works, and creates several
> new edge cases around how ``f_locals`` behaves as trace functions are added
> and removed.
>
> Accordingly, the design was switched to the current one, where
> ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is
> always
> a dynamic snapshot, which is both simpler to implement and easier to
> explain.
>
> Regardless of how the CPython reference implementation chooses to handle
> this,
> optimising compilers and interpreters also remain free to impose additional
> restrictions on debuggers, by making local variable mutation through frame
> objects an opt-in behaviour that may disable some optimisations (just as
> the
> emulation of CPython's frame API is already an opt-in flag in some Python
> implementations).
>
>
> Historical semantics at function scope
> --------------------------------------
>
> The current semantics of mutating ``locals()`` and ``frame.f_locals`` in
> CPython
> are rather quirky due to historical implementation details:
>
> * actual execution uses the fast locals array for local variable bindings
> and
>   cell references for nonlocal variables
> * there's a ``PyFrame_FastToLocals`` operation that populates the frame's
>   ``f_locals`` attribute based on the current state of the fast locals
> array
>   and any referenced cells. This exists for three reasons:
>
>   * allowing trace functions to read the state of local variables
>   * allowing traceback processors to read the state of local variables
>   * allowing ``locals()`` to read the state of local variables
> * a direct reference to ``frame.f_locals`` is returned from ``locals()``,
> so if
>   you hand out multiple concurrent references, then all those references
> will be
>   to the exact same dictionary
> * the two common calls to the reverse operation, ``PyFrame_LocalsToFast``,
> were
>   removed in the migration to Python 3: ``exec`` is no longer a statement
> (and
>   hence can no longer affect function local namespaces), and the compiler
> now
>   disallows the use of ``from module import *`` operations at function
> scope
> * however, two obscure calling paths remain: ``PyFrame_LocalsToFast`` is
> called
>   as part of returning from a trace function (which allows debuggers to
> make
>   changes to the local variable state), and you can also still inject the
>   ``IMPORT_STAR`` opcode when creating a function directly from a code
> object
>   rather than via the compiler
>
> This proposal deliberately *doesn't* formalise these semantics as is,
> since they
> only make sense in terms of the historical evolution of the language and
> the
> reference implementation, rather than being deliberately designed.
>
>
> Implementation
> ==============
>
> The reference implementation update is in development as a draft pull
> request on GitHub ([6]_).
>
>
> Acknowledgements
> ================
>
> Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in
> [1]_ and pointing out some critical design flaws in earlier iterations of
> the
> PEP that attempted to avoid introducing such a proxy.
>
>
> References
> ==========
>
> .. [1] Broken local variable assignment given threads + trace hook +
> closure
>    (https://bugs.python.org/issue30744)
>
> .. [2] Clarify the required behaviour of ``locals()``
>    (https://bugs.python.org/issue17960)
>
> .. [3] Updating function local variables from pdb is unreliable
>    (https://bugs.python.org/issue9633)
>
> .. [4] CPython's Python API for installing trace hooks
>    (https://docs.python.org/dev/library/sys.html#sys.settrace)
>
> .. [5] CPython's C API for installing trace hooks
>    (https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace)
>
> .. [6] PEP 558 reference implementation
>    (https://github.com/python/cpython/pull/3640/files)
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him/his **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190525/d715116a/attachment-0001.html>


More information about the Python-Dev mailing list