PEP 558: Defined semantics for locals()

Hi folks, After a couple of years hiatus, I finally got back to working on the reference implementation for PEP 558, my proposal to resolve some weird interactions between closures and trace functions in CPython by formally defining the expected semantics of the locals() builtin at function scope, and then ensuring that CPython adheres to those clarified semantics. The full text of the PEP is included below, and the rendered version is available at https://www.python.org/dev/peps/pep-0558/ The gist of the PEP is that: 1. The behaviour when no trace functions are installed and no frame introspection APIs are invoked is fine, so nothing should change in that regard (except that we elevate that behaviour from "the way CPython happens to work" to "the way the language and library reference says that Python implementations are supposed to work", and make sure CPython continues to behave that way even when a trace hook *is* installed) 2. If you get hold of a frame object in CPython (or another implementation that emulates the CPython frame API), whether via a trace hook or via a frame introspection API, then writing to the returned mapping will update the actual function locals and/or closure reference immediately, rather than relying on the FastToLocals/LocalsToFast APIs 3. The LocalsToFast C API changes to always produce RuntimeError (since there's no way for us to make it actually work correctly and consistently in the presence of closures, and the trace hook implementation won't need it any more given the write-through proxy behaviour on frame objects' "f_locals" attribute) The reference implementation still isn't quite done yet, but it's far enough along that I'm happy with the semantics and C API updates proposed in the current version of the PEP. Cheers, Nick. P.S. I'm away this weekend, so I expect the reference implementation to be done late next week, and I'd be submitting the PEP to Nathaniel for formal pronouncement at that point. However, I'm posting this thread now so that there's more time for discussion prior to the 3.8b1 deadline. ============== PEP: 558 Title: Defined semantics for locals() Author: Nick Coghlan <ncoghlan@gmail.com> BDFL-Delegate: Nathaniel J. Smith Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2017-09-08 Python-Version: 3.8 Post-History: 2017-09-08, 2019-05-22 Abstract ======== The semantics of the ``locals()`` builtin have historically been underspecified and hence implementation dependent. This PEP proposes formally standardising on the behaviour of the CPython 3.6 reference implementation for most execution scopes, with some adjustments to the behaviour at function scope to make it more predictable and independent of the presence or absence of tracing functions. Rationale ========= While the precise semantics of the ``locals()`` builtin are nominally undefined, in practice, many Python programs depend on it behaving exactly as it behaves in CPython (at least when no tracing functions are installed). Other implementations such as PyPy are currently replicating that behaviour, up to and including replication of local variable mutation bugs that can arise when a trace hook is installed [1]_. While this PEP considers CPython's current behaviour when no trace hooks are installed to be acceptable (and largely desirable), it considers the current behaviour when trace hooks are installed to be problematic, as it causes bugs like [1]_ *without* even reliably enabling the desired functionality of allowing debuggers like ``pdb`` to mutate local variables [3]_. Proposal ======== The expected semantics of the ``locals()`` builtin change based on the current execution scope. For this purpose, the defined scopes of execution are: * module scope: top-level module code, as well as any other code executed using ``exec()`` or ``eval()`` with a single namespace * class scope: code in the body of a ``class`` statement, as well as any other code executed using ``exec()`` or ``eval()`` with separate local and global namespaces * function scope: code in the body of a ``def`` or ``async def`` statement We also allow interpreters to define two "modes" of execution, with only the first mode being considered part of the language specification itself: * regular operation: the way the interpreter behaves by default * tracing mode: the way the interpreter behaves when a trace hook has been registered in one or more threads via an implementation dependent mechanism like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or ``PyEval_SetTrace`` ([5]_) in CPython's C API For regular operation, this PEP proposes elevating the current behaviour of the CPython reference implementation to become part of the language specification. For tracing mode, this PEP proposes changes to CPython's behaviour at function scope that bring the ``locals()`` builtin semantics closer to those used in regular operation, while also making the related frame API semantics clearer and easier for interactive debuggers to rely on. The proposed tracing mode changes also affect the semantics of frame object references obtained through other means, such as via a traceback, or via the ``sys._getframe()`` API. New ``locals()`` documentation ------------------------------ The heart of this proposal is to revise the documentation for the ``locals()`` builtin to read as follows: Return a dictionary representing the current local symbol table, with variable names as the keys, and their currently bound references as the values. This will always be the same dictionary for a given runtime execution frame. At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, this function returns the same namespace as ``globals()``. At class scope, it returns the namespace that will be passed to the metaclass constructor. When using ``exec()`` or ``eval()`` with separate local and global namespaces, it returns the local namespace passed in to the function call. At function scope (including for generators and coroutines), it returns a dynamic snapshot of the function's local variables and any nonlocal cell references. In this case, changes made via the snapshot are *not* written back to the corresponding local variables or nonlocal cell references, and any such changes to the snapshot will be overwritten if the snapshot is subsequently refreshed (e.g. by another call to ``locals()``). CPython implementation detail: the dynamic snapshot for the current frame will be implicitly refreshed before each call to the trace function when a trace function is active. For reference, the current documentation of this builtin reads as follows: Update and return a dictionary representing the current local symbol table. Free variables are returned by locals() when it is called in function blocks, but not in class blocks. Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter. (In other words: the status quo is that the semantics and behaviour of ``locals()`` are currently formally implementation defined, whereas the proposed state after this PEP is that the only implementation defined behaviour will be that encountered at function scope when a tracing function is defined, with the behaviour in all other cases being defined by the language and library references) Module scope ------------ At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, ``locals()`` must return the same object as ``globals()``, which must be the actual execution namespace (available as ``inspect.currentframe().f_locals`` in implementations that provide access to frame objects). Variable assignments during subsequent code execution in the same scope must dynamically change the contents of the returned mapping, and changes to the returned mapping must change the values bound to local variable names in the execution environment. The semantics at module scope are required to be the same in both tracing mode (if provided by the implementation) and in regular operation. To capture this expectation as part of the language specification, the following paragraph will be added to the documentation for ``locals()``: At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, this function returns the same namespace as ``globals()``. This part of the proposal does not require any changes to the reference implementation - it is standardisation of the current behaviour. Class scope ----------- At class scope, as well as when using ``exec()`` or ``eval()`` with separate global and local namespaces, ``locals()`` must return the specified local namespace (which may be supplied by the metaclass ``__prepare__`` method in the case of classes). As for module scope, this must be a direct reference to the actual execution namespace (available as ``inspect.currentframe().f_locals`` in implementations that provide access to frame objects). Variable assignments during subsequent code execution in the same scope must change the contents of the returned mapping, and changes to the returned mapping must change the values bound to local variable names in the execution environment. The mapping returned by ``locals()`` will *not* be used as the actual class namespace underlying the defined class (the class creation process will copy the contents to a fresh dictionary that is only accessible by going through the class machinery). For nested classes defined inside a function, any nonlocal cells referenced from the class scope are *not* included in the ``locals()`` mapping. The semantics at class scope are required to be the same in both tracing mode (if provided by the implementation) and in regular operation. To capture this expectation as part of the language specification, the following two paragraphs will be added to the documentation for ``locals()``: When using ``exec()`` or ``eval()`` with separate local and global namespaces, [this function] returns the given local namespace. At class scope, it returns the namespace that will be passed to the metaclass constructor. This part of the proposal does not require any changes to the reference implementation - it is standardisation of the current behaviour. Function scope -------------- At function scope, interpreter implementations are granted significant freedom to optimise local variable access, and hence are NOT required to permit arbitrary modification of local and nonlocal variable bindings through the mapping returned from ``locals()``. Historically, this leniency has been described in the language specification with the words "The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter." This PEP proposes to change that text to instead say: At function scope (including for generators and coroutines), [this function] returns a dynamic snapshot of the function's local variables and any nonlocal cell references. In this case, changes made via the snapshot are *not* written back to the corresponding local variables or nonlocal cell references, and any such changes to the snapshot will be overwritten if the snapshot is subsequently refreshed (e.g. by another call to ``locals()``). CPython implementation detail: the dynamic snapshot for the currently executing frame will be implicitly refreshed before each call to the trace function when a trace function is active. This part of the proposal *does* require changes to the CPython reference implementation, as while it accurately describes the behaviour in regular operation, the "write back" strategy currently used to support namespace changes from trace functions doesn't comply with it (and also causes the quirky behavioural problems mentioned in the Rationale). CPython Implementation Changes ============================== The current cause of CPython's tracing mode quirks (both the side effects from simply installing a tracing function and the fact that writing values back to function locals only works for the specific function being traced) is the way that locals mutation support for trace hooks is currently implemented: the ``PyFrame_LocalsToFast`` function. When a trace function is installed, CPython currently does the following for function frames (those where the code object uses "fast locals" semantics): 1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot 2. Calls the trace hook (with tracing of the hook itself disabled) 3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the dynamic snapshot This approach is problematic for a few different reasons: * Even if the trace function doesn't mutate the snapshot, the final step resets any cell references back to the state they were in before the trace function was called (this is the root cause of the bug report in [1]_) * If the trace function *does* mutate the snapshot, but then does something that causes the snapshot to be refreshed, those changes are lost (this is one aspect of the bug report in [3]_) * If the trace function attempts to mutate the local variables of a frame other than the one being traced (e.g. ``frame.f_back.f_locals``), those changes will almost certainly be lost (this is another aspect of the bug report in [3]_) * If a ``locals()`` reference is passed to another function, and *that* function mutates the snapshot namespace, then those changes *may* be written back to the execution frame *if* a trace hook is installed The proposed resolution to this problem is to take advantage of the fact that whereas functions typically access their *own* namespace using the language defined ``locals()`` builtin, trace functions necessarily use the implementation dependent ``frame.f_locals`` interface, as a frame reference is what gets passed to hook implementations. Instead of being a direct reference to the dynamic snapshot returned by ``locals()``, ``frame.f_locals`` will be updated to instead return a dedicated proxy type (implemented as a private subclass of the existing ``types.MappingProxyType``) that has two internal attributes not exposed as part of either the Python or public C API: * *mapping*: the dynamic snapshot that is returned by the ``locals()`` builtin * *frame*: the underlying frame that the snapshot is for ``__setitem__`` and ``__delitem__`` operations on the proxy will affect not only the dynamic snapshot, but *also* the corresponding fast local or cell reference on the underlying frame. The ``locals()`` builtin will be made aware of this proxy type, and continue to return a reference to the dynamic snapshot rather than to the write-through proxy. At the C API layer, ``PyEval_GetLocals()`` will implement the same semantics as the Python level ``locals()`` builtin, and a new ``PyFrame_GetPyLocals(frame)`` accessor API will be provided to allow the function level proxy bypass logic to be encapsulated entirely inside the frame implementation. The C level equivalent of accessing ``pyframe.f_locals`` in Python will be a new ``PyFrame_GetLocalsAttr(frame)`` API. Like the Python level descriptor, the new API will implicitly refresh the dynamic snapshot at function scope before returning a reference to the write-through proxy. The ``PyFrame_LocalsToFast()`` function will be changed to always emit ``RuntimeError``, explaining that it is no longer a supported operation, and affected code should be updated to rely on the write-through tracing mode proxy instead. Design Discussion ================= Ensuring ``locals()`` returns a shared snapshot at function scope ----------------------------------------------------------------- The ``locals()`` builtin is a required part of the language, and in the reference implementation it has historically returned a mutable mapping with the following characteristics: * each call to ``locals()`` returns the *same* mapping * for namespaces where ``locals()`` returns a reference to something other than the actual local execution namespace, each call to ``locals()`` updates the mapping with the current state of the local variables and any referenced nonlocal cells * changes to the returned mapping *usually* aren't written back to the local variable bindings or the nonlocal cell references, but write backs can be triggered by doing one of the following: * installing a Python level trace hook (write backs then happen whenever the trace hook is called) * running a function level wildcard import (requires bytecode injection in Py3) * running an ``exec`` statement in the function's scope (Py2 only, since ``exec`` became an ordinary builtin in Python 3) The proposal in this PEP aims to retain the first two properties (to maintain backwards compatibility with as much code as possible) while ensuring that simply installing a trace hook can't enable rebinding of function locals via the ``locals()`` builtin (whereas enabling rebinding via ``frame.f_locals`` inside the tracehook implementation is fully intended). Keeping ``locals()`` as a dynamic snapshot at function scope ------------------------------------------------------------ It would theoretically be possible to change the semantics of the ``locals()`` builtin to return the write-through proxy at function scope, rather than continuing to return a dynamic snapshot. This PEP doesn't (and won't) propose this as it's a backwards incompatible change in practice, even though code that relies on the current behaviour is technically operating in an undefined area of the language specification. Consider the following code snippet:: def example(): x = 1 locals()["x"] = 2 print(x) Even with a trace hook installed, that function will consistently print ``1`` on the current reference interpreter implementation:: >>> example() 1 >>> import sys >>> def basic_hook(*args): ... return basic_hook ... >>> sys.settrace(basic_hook) >>> example() 1 Similarly, ``locals()`` can be passed to the ``exec()`` and ``eval()`` builtins at function scope without risking unexpected rebinding of local variables. Provoking the reference interpreter into incorrectly mutating the local variable state requires a more complex setup where a nested function closes over a variable being rebound in the outer function, and due to the use of either threads, generators, or coroutines, it's possible for a trace function to start running for the nested function before the rebinding operation in the outer function, but finish running after the rebinding operation has taken place (in which case the rebinding will be reverted, which is the bug reported in [1]_). In addition to preserving the de facto semantics which have been in place since PEP 227 introduced nested scopes in Python 2.1, the other benefit of restricting the write-through proxy support to the implementation-defined frame object API is that it means that only interpreter implementations which emulate the full frame API need to offer the write-through capability at all, and that JIT-compiled implementations only need to enable it when a frame introspection API is invoked, or a trace hook is installed, not whenever ``locals()`` is accessed at function scope. What happens with the default args for ``eval()`` and ``exec()``? ----------------------------------------------------------------- These are formally defined as inheriting ``globals()`` and ``locals()`` from the calling scope by default. There isn't any need for the PEP to change these defaults, so it doesn't. Changing the frame API semantics in regular operation ----------------------------------------------------- Earlier versions of this PEP proposed having the semantics of the frame ``f_locals`` attribute depend on whether or not a tracing hook was currently installed - only providing the write-through proxy behaviour when a tracing hook was active, and otherwise behaving the same as the ``locals()`` builtin. That was adopted as the original design proposal for a couple of key reasons, one pragmatic and one more philosophical: * Object allocations and method wrappers aren't free, and tracing functions aren't the only operations that access frame locals from outside the function. Restricting the changes to tracing mode meant that the additional memory and execution time overhead of these changes would as close to zero in regular operation as we can possibly make them. * "Don't change what isn't broken": the current tracing mode problems are caused by a requirement that's specific to tracing mode (support for external rebinding of function local variable references), so it made sense to also restrict any related fixes to tracing mode However, actually attempting to implement and document that dynamic approach highlighted the fact that it makes for a really subtle runtime state dependent behaviour distinction in how ``frame.f_locals`` works, and creates several new edge cases around how ``f_locals`` behaves as trace functions are added and removed. Accordingly, the design was switched to the current one, where ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is always a dynamic snapshot, which is both simpler to implement and easier to explain. Regardless of how the CPython reference implementation chooses to handle this, optimising compilers and interpreters also remain free to impose additional restrictions on debuggers, by making local variable mutation through frame objects an opt-in behaviour that may disable some optimisations (just as the emulation of CPython's frame API is already an opt-in flag in some Python implementations). Historical semantics at function scope -------------------------------------- The current semantics of mutating ``locals()`` and ``frame.f_locals`` in CPython are rather quirky due to historical implementation details: * actual execution uses the fast locals array for local variable bindings and cell references for nonlocal variables * there's a ``PyFrame_FastToLocals`` operation that populates the frame's ``f_locals`` attribute based on the current state of the fast locals array and any referenced cells. This exists for three reasons: * allowing trace functions to read the state of local variables * allowing traceback processors to read the state of local variables * allowing ``locals()`` to read the state of local variables * a direct reference to ``frame.f_locals`` is returned from ``locals()``, so if you hand out multiple concurrent references, then all those references will be to the exact same dictionary * the two common calls to the reverse operation, ``PyFrame_LocalsToFast``, were removed in the migration to Python 3: ``exec`` is no longer a statement (and hence can no longer affect function local namespaces), and the compiler now disallows the use of ``from module import *`` operations at function scope * however, two obscure calling paths remain: ``PyFrame_LocalsToFast`` is called as part of returning from a trace function (which allows debuggers to make changes to the local variable state), and you can also still inject the ``IMPORT_STAR`` opcode when creating a function directly from a code object rather than via the compiler This proposal deliberately *doesn't* formalise these semantics as is, since they only make sense in terms of the historical evolution of the language and the reference implementation, rather than being deliberately designed. Implementation ============== The reference implementation update is in development as a draft pull request on GitHub ([6]_). Acknowledgements ================ Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in [1]_ and pointing out some critical design flaws in earlier iterations of the PEP that attempted to avoid introducing such a proxy. References ========== .. [1] Broken local variable assignment given threads + trace hook + closure (https://bugs.python.org/issue30744) .. [2] Clarify the required behaviour of ``locals()`` (https://bugs.python.org/issue17960) .. [3] Updating function local variables from pdb is unreliable (https://bugs.python.org/issue9633) .. [4] CPython's Python API for installing trace hooks (https://docs.python.org/dev/library/sys.html#sys.settrace) .. [5] CPython's C API for installing trace hooks (https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace) .. [6] PEP 558 reference implementation (https://github.com/python/cpython/pull/3640/files) Copyright ========= This document has been placed in the public domain. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, 22 May 2019 at 00:51, Nick Coghlan <ncoghlan@gmail.com> wrote:
P.S. I'm away this weekend, so I expect the reference implementation to be done late next week, and I'd be submitting the PEP to Nathaniel for formal pronouncement at that point. However, I'm posting this thread now so that there's more time for discussion prior to the 3.8b1 deadline.
I found some time the other day to finish up the core of the reference implementation and resolve some bugs in the new test cases, so the essential changes can now be seen in https://github.com/python/cpython/pull/3640/files There are still some code level TODO items to cover off a few minor points in the PEP, as well as to make the fast locals proxy a better behaved mutable mapping. The proposed documentation changes also still need to be brought over from the PEP, and at least a quick scan done through the code comments and the rest of the documentation for now outdated references to the legacy trace function namespace handling. However, I think the PR does show that the proposed technique can be implemented without too much additional code complexity, and will hopefully be adaptable for all implementations that emulate the frame API at all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23May2019 0636, Nick Coghlan wrote:
However, I think the PR does show that the proposed technique can be implemented without too much additional code complexity, and will hopefully be adaptable for all implementations that emulate the frame API at all.
Much excitement! One of the things I like best about this change is that it actually makes it *easier* for alternative implementations to use a simpler frame object without having to emulate CPython semantics (I'd love to get to a place where bug-for-bug compatibility wasn't required, but this is where we are right now so *shrug*). Cheers, Steve

Hi, On Thu, 23 May 2019 at 17:28, Steve Dower <steve.dower@python.org> wrote:
On 23May2019 0636, Nick Coghlan wrote:
However, I think the PR does show that the proposed technique can be implemented without too much additional code complexity, and will hopefully be adaptable for all implementations that emulate the frame API at all.
Much excitement!
One of the things I like best about this change is that it actually makes it *easier* for alternative implementations to use a simpler frame object without having to emulate CPython semantics (I'd love to get to a place where bug-for-bug compatibility wasn't required, but this is where we are right now so *shrug*).
Thanks Nick for getting this through! A bientôt, Armin

This looks great. I only have two nits with the text. First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it? Second, you use the word "mapping" a lot. Would you mind changing that to "mapping object" in most places? Especially in the phrase "each call to ``locals()`` returns the *same* mapping". To me, without the word "object" added, this *could* be interpreted as "a dict with the same key/value pairs" (since "mapping" is also an abstract mathematical concept describing anything that maps keys to values). Other than that, go for it! (Assuming Nathaniel agrees, of course.) --Guido On Tue, May 21, 2019 at 7:54 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
Hi folks,
After a couple of years hiatus, I finally got back to working on the reference implementation for PEP 558, my proposal to resolve some weird interactions between closures and trace functions in CPython by formally defining the expected semantics of the locals() builtin at function scope, and then ensuring that CPython adheres to those clarified semantics.
The full text of the PEP is included below, and the rendered version is available at https://www.python.org/dev/peps/pep-0558/
The gist of the PEP is that:
1. The behaviour when no trace functions are installed and no frame introspection APIs are invoked is fine, so nothing should change in that regard (except that we elevate that behaviour from "the way CPython happens to work" to "the way the language and library reference says that Python implementations are supposed to work", and make sure CPython continues to behave that way even when a trace hook *is* installed) 2. If you get hold of a frame object in CPython (or another implementation that emulates the CPython frame API), whether via a trace hook or via a frame introspection API, then writing to the returned mapping will update the actual function locals and/or closure reference immediately, rather than relying on the FastToLocals/LocalsToFast APIs 3. The LocalsToFast C API changes to always produce RuntimeError (since there's no way for us to make it actually work correctly and consistently in the presence of closures, and the trace hook implementation won't need it any more given the write-through proxy behaviour on frame objects' "f_locals" attribute)
The reference implementation still isn't quite done yet, but it's far enough along that I'm happy with the semantics and C API updates proposed in the current version of the PEP.
Cheers, Nick.
P.S. I'm away this weekend, so I expect the reference implementation to be done late next week, and I'd be submitting the PEP to Nathaniel for formal pronouncement at that point. However, I'm posting this thread now so that there's more time for discussion prior to the 3.8b1 deadline.
============== PEP: 558 Title: Defined semantics for locals() Author: Nick Coghlan <ncoghlan@gmail.com> BDFL-Delegate: Nathaniel J. Smith Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2017-09-08 Python-Version: 3.8 Post-History: 2017-09-08, 2019-05-22
Abstract ========
The semantics of the ``locals()`` builtin have historically been underspecified and hence implementation dependent.
This PEP proposes formally standardising on the behaviour of the CPython 3.6 reference implementation for most execution scopes, with some adjustments to the behaviour at function scope to make it more predictable and independent of the presence or absence of tracing functions.
Rationale =========
While the precise semantics of the ``locals()`` builtin are nominally undefined, in practice, many Python programs depend on it behaving exactly as it behaves in CPython (at least when no tracing functions are installed).
Other implementations such as PyPy are currently replicating that behaviour, up to and including replication of local variable mutation bugs that can arise when a trace hook is installed [1]_.
While this PEP considers CPython's current behaviour when no trace hooks are installed to be acceptable (and largely desirable), it considers the current behaviour when trace hooks are installed to be problematic, as it causes bugs like [1]_ *without* even reliably enabling the desired functionality of allowing debuggers like ``pdb`` to mutate local variables [3]_.
Proposal ========
The expected semantics of the ``locals()`` builtin change based on the current execution scope. For this purpose, the defined scopes of execution are:
* module scope: top-level module code, as well as any other code executed using ``exec()`` or ``eval()`` with a single namespace * class scope: code in the body of a ``class`` statement, as well as any other code executed using ``exec()`` or ``eval()`` with separate local and global namespaces * function scope: code in the body of a ``def`` or ``async def`` statement
We also allow interpreters to define two "modes" of execution, with only the first mode being considered part of the language specification itself:
* regular operation: the way the interpreter behaves by default * tracing mode: the way the interpreter behaves when a trace hook has been registered in one or more threads via an implementation dependent mechanism like ``sys.settrace`` ([4]_) in CPython's ``sys`` module or ``PyEval_SetTrace`` ([5]_) in CPython's C API
For regular operation, this PEP proposes elevating the current behaviour of the CPython reference implementation to become part of the language specification.
For tracing mode, this PEP proposes changes to CPython's behaviour at function scope that bring the ``locals()`` builtin semantics closer to those used in regular operation, while also making the related frame API semantics clearer and easier for interactive debuggers to rely on.
The proposed tracing mode changes also affect the semantics of frame object references obtained through other means, such as via a traceback, or via the ``sys._getframe()`` API.
New ``locals()`` documentation ------------------------------
The heart of this proposal is to revise the documentation for the ``locals()`` builtin to read as follows:
Return a dictionary representing the current local symbol table, with variable names as the keys, and their currently bound references as the values. This will always be the same dictionary for a given runtime execution frame.
At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, this function returns the same namespace as ``globals()``.
At class scope, it returns the namespace that will be passed to the metaclass constructor.
When using ``exec()`` or ``eval()`` with separate local and global namespaces, it returns the local namespace passed in to the function call.
At function scope (including for generators and coroutines), it returns a dynamic snapshot of the function's local variables and any nonlocal cell references. In this case, changes made via the snapshot are *not* written back to the corresponding local variables or nonlocal cell references, and any such changes to the snapshot will be overwritten if the snapshot is subsequently refreshed (e.g. by another call to ``locals()``).
CPython implementation detail: the dynamic snapshot for the current frame will be implicitly refreshed before each call to the trace function when a trace function is active.
For reference, the current documentation of this builtin reads as follows:
Update and return a dictionary representing the current local symbol table. Free variables are returned by locals() when it is called in function blocks, but not in class blocks.
Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter.
(In other words: the status quo is that the semantics and behaviour of ``locals()`` are currently formally implementation defined, whereas the proposed state after this PEP is that the only implementation defined behaviour will be that encountered at function scope when a tracing function is defined, with the behaviour in all other cases being defined by the language and library references)
Module scope ------------
At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, ``locals()`` must return the same object as ``globals()``, which must be the actual execution namespace (available as ``inspect.currentframe().f_locals`` in implementations that provide access to frame objects).
Variable assignments during subsequent code execution in the same scope must dynamically change the contents of the returned mapping, and changes to the returned mapping must change the values bound to local variable names in the execution environment.
The semantics at module scope are required to be the same in both tracing mode (if provided by the implementation) and in regular operation.
To capture this expectation as part of the language specification, the following paragraph will be added to the documentation for ``locals()``:
At module scope, as well as when using ``exec()`` or ``eval()`` with a single namespace, this function returns the same namespace as ``globals()``.
This part of the proposal does not require any changes to the reference implementation - it is standardisation of the current behaviour.
Class scope -----------
At class scope, as well as when using ``exec()`` or ``eval()`` with separate global and local namespaces, ``locals()`` must return the specified local namespace (which may be supplied by the metaclass ``__prepare__`` method in the case of classes). As for module scope, this must be a direct reference to the actual execution namespace (available as ``inspect.currentframe().f_locals`` in implementations that provide access to frame objects).
Variable assignments during subsequent code execution in the same scope must change the contents of the returned mapping, and changes to the returned mapping must change the values bound to local variable names in the execution environment.
The mapping returned by ``locals()`` will *not* be used as the actual class namespace underlying the defined class (the class creation process will copy the contents to a fresh dictionary that is only accessible by going through the class machinery).
For nested classes defined inside a function, any nonlocal cells referenced from the class scope are *not* included in the ``locals()`` mapping.
The semantics at class scope are required to be the same in both tracing mode (if provided by the implementation) and in regular operation.
To capture this expectation as part of the language specification, the following two paragraphs will be added to the documentation for ``locals()``:
When using ``exec()`` or ``eval()`` with separate local and global namespaces, [this function] returns the given local namespace.
At class scope, it returns the namespace that will be passed to the metaclass constructor.
This part of the proposal does not require any changes to the reference implementation - it is standardisation of the current behaviour.
Function scope --------------
At function scope, interpreter implementations are granted significant freedom to optimise local variable access, and hence are NOT required to permit arbitrary modification of local and nonlocal variable bindings through the mapping returned from ``locals()``.
Historically, this leniency has been described in the language specification with the words "The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter."
This PEP proposes to change that text to instead say:
At function scope (including for generators and coroutines), [this function] returns a dynamic snapshot of the function's local variables and any nonlocal cell references. In this case, changes made via the snapshot are *not* written back to the corresponding local variables or nonlocal cell references, and any such changes to the snapshot will be overwritten if the snapshot is subsequently refreshed (e.g. by another call to ``locals()``).
CPython implementation detail: the dynamic snapshot for the currently executing frame will be implicitly refreshed before each call to the trace function when a trace function is active.
This part of the proposal *does* require changes to the CPython reference implementation, as while it accurately describes the behaviour in regular operation, the "write back" strategy currently used to support namespace changes from trace functions doesn't comply with it (and also causes the quirky behavioural problems mentioned in the Rationale).
CPython Implementation Changes ==============================
The current cause of CPython's tracing mode quirks (both the side effects from simply installing a tracing function and the fact that writing values back to function locals only works for the specific function being traced) is the way that locals mutation support for trace hooks is currently implemented: the ``PyFrame_LocalsToFast`` function.
When a trace function is installed, CPython currently does the following for function frames (those where the code object uses "fast locals" semantics):
1. Calls ``PyFrame_FastToLocals`` to update the dynamic snapshot 2. Calls the trace hook (with tracing of the hook itself disabled) 3. Calls ``PyFrame_LocalsToFast`` to capture any changes made to the dynamic snapshot
This approach is problematic for a few different reasons:
* Even if the trace function doesn't mutate the snapshot, the final step resets any cell references back to the state they were in before the trace function was called (this is the root cause of the bug report in [1]_) * If the trace function *does* mutate the snapshot, but then does something that causes the snapshot to be refreshed, those changes are lost (this is one aspect of the bug report in [3]_) * If the trace function attempts to mutate the local variables of a frame other than the one being traced (e.g. ``frame.f_back.f_locals``), those changes will almost certainly be lost (this is another aspect of the bug report in [3]_) * If a ``locals()`` reference is passed to another function, and *that* function mutates the snapshot namespace, then those changes *may* be written back to the execution frame *if* a trace hook is installed
The proposed resolution to this problem is to take advantage of the fact that whereas functions typically access their *own* namespace using the language defined ``locals()`` builtin, trace functions necessarily use the implementation dependent ``frame.f_locals`` interface, as a frame reference is what gets passed to hook implementations.
Instead of being a direct reference to the dynamic snapshot returned by ``locals()``, ``frame.f_locals`` will be updated to instead return a dedicated proxy type (implemented as a private subclass of the existing ``types.MappingProxyType``) that has two internal attributes not exposed as part of either the Python or public C API:
* *mapping*: the dynamic snapshot that is returned by the ``locals()`` builtin * *frame*: the underlying frame that the snapshot is for
``__setitem__`` and ``__delitem__`` operations on the proxy will affect not only the dynamic snapshot, but *also* the corresponding fast local or cell reference on the underlying frame.
The ``locals()`` builtin will be made aware of this proxy type, and continue to return a reference to the dynamic snapshot rather than to the write-through proxy.
At the C API layer, ``PyEval_GetLocals()`` will implement the same semantics as the Python level ``locals()`` builtin, and a new ``PyFrame_GetPyLocals(frame)`` accessor API will be provided to allow the function level proxy bypass logic to be encapsulated entirely inside the frame implementation.
The C level equivalent of accessing ``pyframe.f_locals`` in Python will be a new ``PyFrame_GetLocalsAttr(frame)`` API. Like the Python level descriptor, the new API will implicitly refresh the dynamic snapshot at function scope before returning a reference to the write-through proxy.
The ``PyFrame_LocalsToFast()`` function will be changed to always emit ``RuntimeError``, explaining that it is no longer a supported operation, and affected code should be updated to rely on the write-through tracing mode proxy instead.
Design Discussion =================
Ensuring ``locals()`` returns a shared snapshot at function scope -----------------------------------------------------------------
The ``locals()`` builtin is a required part of the language, and in the reference implementation it has historically returned a mutable mapping with the following characteristics:
* each call to ``locals()`` returns the *same* mapping * for namespaces where ``locals()`` returns a reference to something other than the actual local execution namespace, each call to ``locals()`` updates the mapping with the current state of the local variables and any referenced nonlocal cells * changes to the returned mapping *usually* aren't written back to the local variable bindings or the nonlocal cell references, but write backs can be triggered by doing one of the following:
* installing a Python level trace hook (write backs then happen whenever the trace hook is called) * running a function level wildcard import (requires bytecode injection in Py3) * running an ``exec`` statement in the function's scope (Py2 only, since ``exec`` became an ordinary builtin in Python 3)
The proposal in this PEP aims to retain the first two properties (to maintain backwards compatibility with as much code as possible) while ensuring that simply installing a trace hook can't enable rebinding of function locals via the ``locals()`` builtin (whereas enabling rebinding via ``frame.f_locals`` inside the tracehook implementation is fully intended).
Keeping ``locals()`` as a dynamic snapshot at function scope ------------------------------------------------------------
It would theoretically be possible to change the semantics of the ``locals()`` builtin to return the write-through proxy at function scope, rather than continuing to return a dynamic snapshot.
This PEP doesn't (and won't) propose this as it's a backwards incompatible change in practice, even though code that relies on the current behaviour is technically operating in an undefined area of the language specification.
Consider the following code snippet::
def example(): x = 1 locals()["x"] = 2 print(x)
Even with a trace hook installed, that function will consistently print ``1`` on the current reference interpreter implementation::
>>> example() 1 >>> import sys >>> def basic_hook(*args): ... return basic_hook ... >>> sys.settrace(basic_hook) >>> example() 1
Similarly, ``locals()`` can be passed to the ``exec()`` and ``eval()`` builtins at function scope without risking unexpected rebinding of local variables.
Provoking the reference interpreter into incorrectly mutating the local variable state requires a more complex setup where a nested function closes over a variable being rebound in the outer function, and due to the use of either threads, generators, or coroutines, it's possible for a trace function to start running for the nested function before the rebinding operation in the outer function, but finish running after the rebinding operation has taken place (in which case the rebinding will be reverted, which is the bug reported in [1]_).
In addition to preserving the de facto semantics which have been in place since PEP 227 introduced nested scopes in Python 2.1, the other benefit of restricting the write-through proxy support to the implementation-defined frame object API is that it means that only interpreter implementations which emulate the full frame API need to offer the write-through capability at all, and that JIT-compiled implementations only need to enable it when a frame introspection API is invoked, or a trace hook is installed, not whenever ``locals()`` is accessed at function scope.
What happens with the default args for ``eval()`` and ``exec()``? -----------------------------------------------------------------
These are formally defined as inheriting ``globals()`` and ``locals()`` from the calling scope by default.
There isn't any need for the PEP to change these defaults, so it doesn't.
Changing the frame API semantics in regular operation -----------------------------------------------------
Earlier versions of this PEP proposed having the semantics of the frame ``f_locals`` attribute depend on whether or not a tracing hook was currently installed - only providing the write-through proxy behaviour when a tracing hook was active, and otherwise behaving the same as the ``locals()`` builtin.
That was adopted as the original design proposal for a couple of key reasons, one pragmatic and one more philosophical:
* Object allocations and method wrappers aren't free, and tracing functions aren't the only operations that access frame locals from outside the function. Restricting the changes to tracing mode meant that the additional memory and execution time overhead of these changes would as close to zero in regular operation as we can possibly make them. * "Don't change what isn't broken": the current tracing mode problems are caused by a requirement that's specific to tracing mode (support for external rebinding of function local variable references), so it made sense to also restrict any related fixes to tracing mode
However, actually attempting to implement and document that dynamic approach highlighted the fact that it makes for a really subtle runtime state dependent behaviour distinction in how ``frame.f_locals`` works, and creates several new edge cases around how ``f_locals`` behaves as trace functions are added and removed.
Accordingly, the design was switched to the current one, where ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is always a dynamic snapshot, which is both simpler to implement and easier to explain.
Regardless of how the CPython reference implementation chooses to handle this, optimising compilers and interpreters also remain free to impose additional restrictions on debuggers, by making local variable mutation through frame objects an opt-in behaviour that may disable some optimisations (just as the emulation of CPython's frame API is already an opt-in flag in some Python implementations).
Historical semantics at function scope --------------------------------------
The current semantics of mutating ``locals()`` and ``frame.f_locals`` in CPython are rather quirky due to historical implementation details:
* actual execution uses the fast locals array for local variable bindings and cell references for nonlocal variables * there's a ``PyFrame_FastToLocals`` operation that populates the frame's ``f_locals`` attribute based on the current state of the fast locals array and any referenced cells. This exists for three reasons:
* allowing trace functions to read the state of local variables * allowing traceback processors to read the state of local variables * allowing ``locals()`` to read the state of local variables * a direct reference to ``frame.f_locals`` is returned from ``locals()``, so if you hand out multiple concurrent references, then all those references will be to the exact same dictionary * the two common calls to the reverse operation, ``PyFrame_LocalsToFast``, were removed in the migration to Python 3: ``exec`` is no longer a statement (and hence can no longer affect function local namespaces), and the compiler now disallows the use of ``from module import *`` operations at function scope * however, two obscure calling paths remain: ``PyFrame_LocalsToFast`` is called as part of returning from a trace function (which allows debuggers to make changes to the local variable state), and you can also still inject the ``IMPORT_STAR`` opcode when creating a function directly from a code object rather than via the compiler
This proposal deliberately *doesn't* formalise these semantics as is, since they only make sense in terms of the historical evolution of the language and the reference implementation, rather than being deliberately designed.
Implementation ==============
The reference implementation update is in development as a draft pull request on GitHub ([6]_).
Acknowledgements ================
Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in [1]_ and pointing out some critical design flaws in earlier iterations of the PEP that attempted to avoid introducing such a proxy.
References ==========
.. [1] Broken local variable assignment given threads + trace hook + closure (https://bugs.python.org/issue30744)
.. [2] Clarify the required behaviour of ``locals()`` (https://bugs.python.org/issue17960)
.. [3] Updating function local variables from pdb is unreliable (https://bugs.python.org/issue9633)
.. [4] CPython's Python API for installing trace hooks (https://docs.python.org/dev/library/sys.html#sys.settrace)
.. [5] CPython's C API for installing trace hooks (https://docs.python.org/3/c-api/init.html#c.PyEval_SetTrace)
.. [6] PEP 558 reference implementation (https://github.com/python/cpython/pull/3640/files)
Copyright =========
This document has been placed in the public domain.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 5/25/2019 10:36 AM, Guido van Rossum wrote:
This looks great.
I agree. I understand and have tried to explain normal operation multiple times. The proposed new doc looks better than anything I ever wrote. (I never even thought about locals() with tracing on.) The improved clarity well justifies the increased space.
I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it?
Good catch. 'snapshot' is from "At function scope (including for generators and coroutines), [locals()] returns a dynamic snapshot of the function's local variables and any nonlocal cell references." 'Dynamic' could be misunderstood to mean the function locals() snapshot always tracks the underlying function namespace. The point of using the 'snapshot' metaphor is that it does not. The snapshot is 'dynamic' in that it can be changed, but the same is true of real photos. But just as with real photos, changing the snapshot does not change the reality it represents (as explained in the next proposed sentence). The 'snapshot' metaphor does not need 'dynamic' and I think it works better without it.
Second, you use the word "mapping" a lot. Would you mind changing that to "mapping object" in most places? Especially in the phrase "each call to ``locals()`` returns the *same* mapping". To me, without the word "object" added, this *could* be interpreted as "a dict with the same key/value pairs" (since "mapping" is also an abstract mathematical concept describing anything that maps keys to values).
Agreed also. -- Terry Jan Reedy

On Sat, May 25, 2019, 07:38 Guido van Rossum <guido@python.org> wrote:
This looks great.
I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it?
It's dynamic in that it can spontaneously change when certain other events happen. For example, imagine this code runs at function scope: # take a snapshot a = locals() # it's a snapshot, so it doesn't include the new variable assert "a" not in a # take another snapshot b = locals() # now our first "snapshot" has changed assert "a" in a Overall I'm happy with the PEP, but I'm still a bit uneasy about whether we've gotten the details of this "dynamicity" exactly right, esp. since the PEP promotes them from implementation detail to language features. There are a lot of complicated tradeoffs so I'm working on a longer response that tries to lay out all the options and hopefully convince myself (and everyone else). -n

That's a good fine point that the PEP could call out, but just adding "dynamic" in front of "snapshot" everywhere doesn't tell me any of that. Given that the code calling locals() must of necessity live in the same function body (except for the special case of trace functions), I don't think that what you describe here is too worrisome a scenario. On Sat, May 25, 2019 at 2:09 PM Nathaniel Smith <njs@pobox.com> wrote:
On Sat, May 25, 2019, 07:38 Guido van Rossum <guido@python.org> wrote:
This looks great.
I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it?
It's dynamic in that it can spontaneously change when certain other events happen. For example, imagine this code runs at function scope:
# take a snapshot a = locals()
# it's a snapshot, so it doesn't include the new variable assert "a" not in a
# take another snapshot b = locals()
# now our first "snapshot" has changed assert "a" in a
Overall I'm happy with the PEP, but I'm still a bit uneasy about whether we've gotten the details of this "dynamicity" exactly right, esp. since the PEP promotes them from implementation detail to language features. There are a lot of complicated tradeoffs so I'm working on a longer response that tries to lay out all the options and hopefully convince myself (and everyone else).
-n
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 5/25/19 5:09 PM, Nathaniel Smith wrote:
On Sat, May 25, 2019, 07:38 Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote:
This looks great.
I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it?
It's dynamic in that it can spontaneously change when certain other events happen. For example, imagine this code runs at function scope:
# take a snapshot a = locals()
# it's a snapshot, so it doesn't include the new variable assert "a" not in a
# take another snapshot b = locals()
# now our first "snapshot" has changed assert "a" in a
Overall I'm happy with the PEP, but I'm still a bit uneasy about whether we've gotten the details of this "dynamicity" exactly right, esp. since the PEP promotes them from implementation detail to language features. There are a lot of complicated tradeoffs so I'm working on a longer response that tries to lay out all the options and hopefully convince myself (and everyone else).
-n To me that is a static snapshot of a dynamic environment, not a dynamic snapshot. The snapshot you get at THAT moment in time won't change, as time progresses, so that snapshot itself isn't dynamic. Calling something a 'dynamic snapshot' could be take to imply that the snapshot itself is dynamic, and thus changes at that environment changes (and you could pass that snapshot to some other place, and they could get a view of things just like you would see it there,
-- Richard Damon

On Sun, May 26, 2019 at 8:38 AM Richard Damon <Richard@damon-family.org> wrote:
On 5/25/19 5:09 PM, Nathaniel Smith wrote:
a = locals() b = locals() # now our first "snapshot" has changed assert "a" in a
To me that is a static snapshot of a dynamic environment, not a dynamic snapshot. The snapshot you get at THAT moment in time won't change, as time progresses, so that snapshot itself isn't dynamic.
Except that it does. After calling locals() a second time, the result of the *first* call will be updated to reflect changes. It's not a frozen snapshot; you can't diff the two dictionaries to see what's changed (unless you use "a = dict(locals())" or something, which IS a snapshot).
From my reading of the description, you could also "assert a is b" - is that correct?
ChrisA

(And this time I will remember to remove the SPAM label...) On Sun, May 26, 2019 at 08:44:33AM +1000, Chris Angelico wrote:
From my reading of the description, you could also "assert a is b" - is that correct?
Yes, that's already the behaviour. py> def demo(): ... a = locals() ... b = locals() ... print(a is b) ... py> demo() True -- Steven

On Sun, May 26, 2019 at 8:07 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 26, 2019 at 08:44:33AM +1000, Chris Angelico wrote:
From my reading of the description, you could also "assert a is b" - is that correct?
Yes, that's already the behaviour.
py> def demo(): ... a = locals() ... b = locals() ... print(a is b) ... py> demo() True
Sure, but this PEP is all about defining things that weren't previously defined, so I wanted to clarify intent rather than current behaviour. ChrisA

On Sun, May 26, 2019 at 09:20:53PM +1000, Chris Angelico wrote:
Sure, but this PEP is all about defining things that weren't previously defined, so I wanted to clarify intent rather than current behaviour.
As I understand it, the intent is to: - fix some annoyances/bugs involved when you have a trace hook; - make the current CPython behaviour a language behaviour. -- Steven

On Sun, 26 May 2019 at 12:23, Chris Angelico <rosuav@gmail.com> wrote:
On Sun, May 26, 2019 at 8:07 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, May 26, 2019 at 08:44:33AM +1000, Chris Angelico wrote:
From my reading of the description, you could also "assert a is b" - is that correct?
Yes, that's already the behaviour.
py> def demo(): ... a = locals() ... b = locals() ... print(a is b) ... py> demo() True
Sure, but this PEP is all about defining things that weren't previously defined, so I wanted to clarify intent rather than current behaviour.
+1 on the PEP being explicit over this. Even though it's current behaviour, it's surprising and making a clear and definitive statement that the PEP intends to make the behaviour part of the language definition and not just a CPython detail, is IMO worthwhile. Paul

Chris Angelico wrote:
Except that it does. After calling locals() a second time, the result of the *first* call will be updated to reflect changes.
Yeow. That's *really* unintuitive. There had better be an extremely good reason for this behaviour. -- Greg

On 5/27/2019 3:18 AM, Greg Ewing wrote:
Chris Angelico wrote:
Except that it does. After calling locals() a second time, the result of the *first* call will be updated to reflect changes.
Yeow. That's *really* unintuitive. There had better be an extremely good reason for this behaviour.
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it. -- Terry Jan Reedy

On 5/27/19 9:12 AM, Terry Reedy wrote:
On 5/27/2019 3:18 AM, Greg Ewing wrote:
Chris Angelico wrote:
Except that it does. After calling locals() a second time, the result of the *first* call will be updated to reflect changes.
Yeow. That's *really* unintuitive. There had better be an extremely good reason for this behaviour.
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
I had a similar concern, and one BIG issue with it being define this way is that you get a fundamental re-entrancy problem. If module a uses locals(), and then calls module b that uses locals(), module a has lost its usage. One implication of this is that then you really want ALL modules to define if they use the locals() function or not, then you get the question, does this 1 of apply across threads? does a call to locals in another thread make me lose my locals (or does each thread get its own version), if that is true then if you might possible be in a situation where threads are in play you MUST make the copy anyway, and do it fast enough that the GIL isn't released between the snapshot and the copy (if possible), C made this sort of mistake decades ago for some functions, not thinking about threads or re-entrancy, and had to create solutions to fix it. Let us not forget history and thus repeat it. Is there a fundamental reason that local needs to keep a single dict, as opposed to creating a new one for each call? The way it is currently defined, once it is called, the snapshot will stay forever, consuming resources, while if a new dict was created, the resource would be reclaimed after use. Yes, if called twice you end up with two copies instead of both being updated to the current, but if you WANTED to just update the current copy, you could just rebind it to the new version, otherwise you are just forcing the programmer to be making the copies explicitly. -- Richard Damon

No, there's only one locals() dict *per stack frame*. So no worries about concurrency. On Mon, May 27, 2019 at 6:54 AM Richard Damon <Richard@damon-family.org> wrote:
On 5/27/19 9:12 AM, Terry Reedy wrote:
On 5/27/2019 3:18 AM, Greg Ewing wrote:
Chris Angelico wrote:
Except that it does. After calling locals() a second time, the result of the *first* call will be updated to reflect changes.
Yeow. That's *really* unintuitive. There had better be an extremely good reason for this behaviour.
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
I had a similar concern, and one BIG issue with it being define this way is that you get a fundamental re-entrancy problem. If module a uses locals(), and then calls module b that uses locals(), module a has lost its usage. One implication of this is that then you really want ALL modules to define if they use the locals() function or not, then you get the question, does this 1 of apply across threads? does a call to locals in another thread make me lose my locals (or does each thread get its own version), if that is true then if you might possible be in a situation where threads are in play you MUST make the copy anyway, and do it fast enough that the GIL isn't released between the snapshot and the copy (if possible),
C made this sort of mistake decades ago for some functions, not thinking about threads or re-entrancy, and had to create solutions to fix it. Let us not forget history and thus repeat it.
Is there a fundamental reason that local needs to keep a single dict, as opposed to creating a new one for each call? The way it is currently defined, once it is called, the snapshot will stay forever, consuming resources, while if a new dict was created, the resource would be reclaimed after use. Yes, if called twice you end up with two copies instead of both being updated to the current, but if you WANTED to just update the current copy, you could just rebind it to the new version, otherwise you are just forcing the programmer to be making the copies explicitly.
-- Richard Damon
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 5/27/2019 9:52 AM, Richard Damon wrote:
On 5/27/19 9:12 AM, Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict.
per function invocation, or more generally, as Guido said, per stack frame. This part is obvious to me, but I should have been clearer.
Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
I had a similar concern, and one BIG issue with it being define this way is that you get a fundamental re-entrancy problem. If module a uses locals(), and then calls module b that uses locals(), module a has lost its usage.
No. Sorry about being unclear. -- Terry Jan Reedy

On 5/27/19 2:05 PM, Terry Reedy wrote:
On 5/27/2019 9:52 AM, Richard Damon wrote:
On 5/27/19 9:12 AM, Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict.
per function invocation, or more generally, as Guido said, per stack frame. This part is obvious to me, but I should have been clearer.
Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
I had a similar concern, and one BIG issue with it being define this way is that you get a fundamental re-entrancy problem. If module a uses locals(), and then calls module b that uses locals(), module a has lost its usage.
No. Sorry about being unclear.
Ok, if each function invocation gets its own dict, then the reentrancy issues go away. The fact that it isn't the 'active' dict, so you can't use it to modify the current state, but also you don't get a fresh copy each time (there is a single mutable dict that gets updated) seems to be an odd behavior and I can't see where it is an advantage to the user of the function, or where that makes it easier on the implementation. (But I could easy be missing something). -- Richard Damon

I'm guessing the reason is to remove the overhead of keeping the dictionary up to date during function execution when no Python code needs access to it. On Mon, May 27, 2019 at 8:10 PM Richard Damon <Richard@damon-family.org> wrote:
On 5/27/19 2:05 PM, Terry Reedy wrote:
On 5/27/2019 9:52 AM, Richard Damon wrote:
On 5/27/19 9:12 AM, Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict.
per function invocation, or more generally, as Guido said, per stack frame. This part is obvious to me, but I should have been clearer.
Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
I had a similar concern, and one BIG issue with it being define this way is that you get a fundamental re-entrancy problem. If module a uses locals(), and then calls module b that uses locals(), module a has lost its usage.
No. Sorry about being unclear.
Ok, if each function invocation gets its own dict, then the reentrancy issues go away.
The fact that it isn't the 'active' dict, so you can't use it to modify the current state, but also you don't get a fresh copy each time (there is a single mutable dict that gets updated) seems to be an odd behavior and I can't see where it is an advantage to the user of the function, or where that makes it easier on the implementation. (But I could easy be missing something).
-- Richard Damon
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/steve%40holdenweb.com

Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
Yes, I understand *what's* happening, but not *why* it was designed that way. Would it really be probihitively expensive to create a fresh dict each time? -- Greg

On Tue, May 28, 2019 at 5:25 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
Yes, I understand *what's* happening, but not *why* it was designed that way. Would it really be prohibitively expensive to create a fresh dict each time?
No. But it would be inconsistent with the behavior at module level. FWIW I am leaning more and more to the [proxy] model, where locals() and frame.f_locals are the same object, which *proxies* the fast locals and cells. That only has one downside: it no longer returns a dict, but merely a MutableMapping. But why would code care about the difference? (There used to be some relevant builtins that took dicts but not general MutableMappings -- but that has been fixed long ago.) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him/his **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, May 28, 2019 at 6:02 PM Guido van Rossum <guido@python.org> wrote:
On Tue, May 28, 2019 at 5:25 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
Yes, I understand *what's* happening, but not *why* it was designed that way. Would it really be prohibitively expensive to create a fresh dict each time?
No. But it would be inconsistent with the behavior at module level.
FWIW I am leaning more and more to the [proxy] model, where locals() and frame.f_locals are the same object, which *proxies* the fast locals and cells. That only has one downside: it no longer returns a dict, but merely a MutableMapping. But why would code care about the difference? (There used to be some relevant builtins that took dicts but not general MutableMappings -- but that has been fixed long ago.)
Related trivia: the exec() and eval() builtins still mandate that their 'globals' argument be an actual no-fooling dict, but their 'locals' argument is allowed to be any kind of mapping object. This is an intentional, documented feature [1]. And inside the exec/eval, calls to locals() return whatever object was passed. For example:
exec("print(type(locals()))", {}, collections.ChainMap()) <class 'collections.ChainMap'>
So technically speaking, it's already possible for locals() to return a non-dict. Of course this is incredibly uncommon in practice, so existing code doesn't necessarily take it into account. But it's some kind of conceptual precedent, anyway. -n [1] See https://docs.python.org/3/library/functions.html#eval and the 'exec' docs right below it. I think the motivation is that in the current CPython implementation, every time you access a global it does a direct lookup in the globals object, so it's important that we do this lookup as fast as possible, and forcing the globals object to be a actual dict allows some optimizations. For locals, though, we usually use the "fast locals" mechanism and the mapping object is mostly vestigial, so it doesn't matter how fast lookups are, so we can support any mapping. -- Nathaniel J. Smith -- https://vorpus.org

On Tue, May 28, 2019 at 5:24 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Terry Reedy wrote:
I believe that the situation is or can be thought of as this: there is exactly 1 function locals dict. Initially, it is empty and inaccessible (unusable) from code. Each locals() call updates the dict to a current snapshot and returns it.
Yes, I understand *what's* happening, but not *why* it was designed that way.
I'm not sure of the exact history, but I think it's something like: In the Beginning, CPython was Simple, but Slow: every frame struct had an f_locals field, it was always a dict, the bytecode accessed the dict, locals() returned the dict, that was that. Then one day the serpent of Performance Optimization came, whispering of static analysis of function scope and LOAD_FAST bytecodes. And we were seduced by the serpent's vision, and made CPython Faster, with semantics that were Almost The Same, and we shipped it to our users. But now the sin of Cache Inconsistency had entered our hearts, and we were condemned to labor endlessly: again and again, users discovered a leak in our abstraction, and again and again we covered our sin with new patches, until Simplicity was obscured. (The current design does makes sense, but you really have to look at it as a hard-fought compromise between the elegant original design versus ~30 years of real-world demands. And hey, it could be worse – look at the fun Intel's been having with their caches.) -n -- Nathaniel J. Smith -- https://vorpus.org

Richard, your email seems to have introduced a spurious "SPAM" label to this thread, which may confuse some email clients into treating it as spam. Can you teach your email program that this mailing list is ham, not spam, or failing that, at least edit the subject line to remove the label? Thanks. I've done so for this response, but please take care that you don't re-introduce the label again, thanks. On Sat, May 25, 2019 at 06:37:22PM -0400, Richard Damon wrote:
To me that is a static snapshot of a dynamic environment, not a dynamic snapshot. The snapshot you get at THAT moment in time won't change, as time progresses, so that snapshot itself isn't dynamic.
Actually, it does change -- but the confusing part is that it doesn't change automatically but only when you call the locals() function again. This already CPython's behaviour, so that is not changing. def demo1(): a = b = c = 1 a = locals() print(a) b = 999 print(a) def demo2(): a = b = c = 1 a = locals() print(a) b = 999 locals() # call function but throw the result away print(a) And running those two functions in Python 3.5: py> demo1() # No change to the dict. {'a': 1, 'b': 1, 'c': 1} {'a': 1, 'b': 1, 'c': 1} py> demo2() # Dict magically updates! {'a': 1, 'b': 1, 'c': 1} {'a': {...}, 'b': 999, 'c': 1} I know this is the backwards-compatible behaviour, but I would like to question whether we want to enshrine it in stone. This seems to me to be the worst possible combinations of features: - writing to the locals() dict doesn't write changes back to the actual local variables; - the dict returned isn't a fixed snapshot, but the PEP calls it a snapshot despite not being one (naming things is hard); - the "snapshop" can change as a side-effect of another operation. If this wasn't already the behaviour, would we want it? -- Steven

On Sun, 26 May 2019 at 00:37, Guido van Rossum <guido@python.org> wrote:
This looks great.
I only have two nits with the text.
First, why is the snapshot called a "dynamic snapshot"? What exactly is dynamic about it?
The dynamic part comes in if you call locals() twice: because it returns the same mapping object every time, the second call to locals() will update the snapshot with the updated values of the local variables. To get a static snapshot that won't be implicitly updated by subsequent calls to locals() (or accesses to frame.f_locals, or implicit updates prior to trace function invocation) you have to do "locals().copy()". As others have noted, this isn't the behaviour we would choose if we were designing this API from scratch, but the current CPython behaviour has ~18 years of de facto standardisation behind it (as near as I can tell it has worked this way since nested scopes and fast locals were introduced in Python 2.1), so I'd be pretty hesitant about changing it at this point. In particular, I want to preserve the existing behaviour of Python 3 code that runs exec() and eval() at function scope, which means ensuring that locals(): 1. Continues to behave like a normal dict instance when passed to another API 2. Continues to leave the actual local variables of the calling frame alone
Second, you use the word "mapping" a lot. Would you mind changing that to "mapping object" in most places? Especially in the phrase "each call to ``locals()`` returns the *same* mapping". To me, without the word "object" added, this *could* be interpreted as "a dict with the same key/value pairs" (since "mapping" is also an abstract mathematical concept describing anything that maps keys to values).
That makes sense - I'll update the text. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Currently f_locals is documented as readonly [1]. The PEP says:
* "Don't change what isn't broken": the current tracing mode problems are caused by a requirement that's specific to tracing mode (support for external rebinding of function local variable references), so it made sense to also restrict any related fixes to tracing mode
However, actually attempting to implement and document that dynamic approach highlighted the fact that it makes for a really subtle runtime state dependent behaviour distinction in how ``frame.f_locals`` works, and creates several new edge cases around how ``f_locals`` behaves as trace functions are added and removed.
Accordingly, the design was switched to the current one, where ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is always a dynamic snapshot, which is both simpler to implement and easier to explain.
Do these edge cases still exist when f_locals write access is restricted to code executed by the tracing function (which is more restrictive than 'tracing mode') ? We can use the condition frame->f_trace not NULL and tstate->tracing true (tstate being a pointer to the PyThreadState structure) to know when code is executed by the tracing function [2]: * The condition on tstate->tracing allows to figure out if we are running a frame executed by the trace function as opposed to a frame that is being traced or a frame executed in 'regular operation'. * The condition on frame->f_trace removes the ambiguity whether tstate->tracing is set by a tracing function or by a profiling function. [1] In section 'Frame objects' at https://github.com/python/cpython/blob/master/Doc/reference/datamodel.rst#th... [2] Except that frame->f_trace is NULL in a 'PyTrace_CALL' trace event, so f_locals would remain readonly in that case. But this event is always followed by a 'PyTrace_LINE' event anyway so this limitation is not important IMO. Xavier

On Fri., 31 May 2019, 5:20 am Xavier de Gaye, <xdegaye@gmail.com> wrote:
Currently f_locals is documented as readonly [1].
Read-only in the sense that you can't rebind it to point to a different object - the dict it points to is mutable.
The PEP says:
* "Don't change what isn't broken": the current tracing mode problems are caused by a requirement that's specific to tracing mode (support for external rebinding of function local variable references), so it made sense to also restrict any related fixes to tracing mode
However, actually attempting to implement and document that dynamic approach highlighted the fact that it makes for a really subtle runtime state dependent behaviour distinction in how ``frame.f_locals`` works, and creates several new edge cases around how ``f_locals`` behaves as trace functions are added and removed.
Accordingly, the design was switched to the current one, where ``frame.f_locals`` is always a write-through proxy, and ``locals()`` is always a dynamic snapshot, which is both simpler to implement and easier to explain.
Do these edge cases still exist when f_locals write access is restricted to code executed by the tracing function (which is more restrictive than 'tracing mode') ?
We can use the condition frame->f_trace not NULL and tstate->tracing true
(tstate being a pointer to the PyThreadState structure) to know when code is executed by the tracing function [2]: * The condition on tstate->tracing allows to figure out if we are running a frame executed by the trace function as opposed to a frame that is being traced or a frame executed in 'regular operation'. * The condition on frame->f_trace removes the ambiguity whether tstate->tracing is set by a tracing function or by a profiling function.
Always creating the proxy and sometimes bypassing it and returning the snapshot instead would indeed have fewer edge cases than sometimes storing the snapshot directly on the frame object without creating the proxy at all. It's still significantly harder to document than "frame.f_locals references a proxy, locals() creates a snapshot", though. Cheers, Nick.
participants (13)
-
Armin Rigo
-
Chris Angelico
-
Greg Ewing
-
Guido van Rossum
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Richard Damon
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Terry Reedy
-
Xavier de Gaye