(I'm not sure Mailman will get the threading right when I've received the original email directly, so apologies in advance if this reply creates a 2nd thread) On Mon, 26 Jul 2021 at 23:57, Mark Shannon <mark@hotpy.org> wrote:
Hi,
First of all let me say that I agree with the aims of PEP 558 and most of the design. I do find the wording of the PEP itself a bit confusing in a couple of places.
This critique is based on my understanding of the PEP. If I am mistaken in my misunderstanding, then treat that as an implied request for clarification of the PEP :)
Critique of PEP 558 ===================
Layout of the PEP -----------------
[This is largely to help the reader, it doesn't change the nature of the PEP]
Could we replace the section "CPython Implementation Changes" with a section that states what the behavior will be, not how it changes. Having to mentally apply the suggested changes to the existing (and convoluted) implementation makes it hard to work out what the proposed behavior will be.
Just stating what the new behaviour is isn't sufficient to assess the impact of the change, though - it needs to be compared to the old (arcane) behaviour, and I don't want to assume readers already understand what that behaviour is. That said, the summary section is intended to describe how things will work post-change. It just leaves out the details of the new proxy implementation (which I admit is the most complicated part of the updated implementation).
Could the design discussion be moved to an appendix or another document? The key parts of it should be moved to the Rationale or Motivation.
It's effectively an Appendix already (the only things after it are the pointer to the implementation and the acknoweldgements). That said, I'll note that several of the questions you've asked in this email are answered directly in the Design Discussions section that you're suggesting removing from the PEP.
There should be a Motivation section explaining why this PEP is necessary, for those not familiar with the weirdness of `locals()`. Much of what is in the Rationale should perhaps be in the Motivation.
Yes, the Rationale section could be retitled Motivation - it's the rationale for the PEP existing, not the rationale for the design decisions (those are in the Design Discussion section). The historical arcane behaviour at function scope is covered in https://www.python.org/dev/peps/pep-0558/#historical-semantics-at-function-s... but the bug references seemed more important for the Motivation section (if the status quo *worked*, I'd never have tried to work out how to change it, but with the status quo both arcane *and* broken, it makes sense to try to figure out a better alternative).
Proposal --------
[Ditto; this is largely to help the reader, it doesn't change the nature of the PEP]
Drop the definitions of the type of scope. They are (at least they should be) clearly defined in existing documentation.
They're not, unfortunately. As far as I am aware, https://docs.python.org/3/reference/executionmodel.html#naming-and-binding is as good as we've currently got, so
Why "largely" eliminate the concept of a separate tracing mode? Wasn't the plan to eliminate it entirely?
The "largely" there just relates to the fact that even though the PEP eliminates the side effects that tracing mode has historically had on the behaviour of locals(), the core eval loop still has the notion of "tracing or not" that turns off some of the execution shortcuts it can otherwise take.
Rather than specifying what the documentation will be, could you specify the semantics. The language here should be precise.
The reference documentation should be precise as well, since that is what other implementations will be following. What semantics do you feel are left unspecified?
The documentation needs to explain the behavior to the majority of users, but the PEP should be worded so that it can be implemented correctly from just reading the PEP.
There is no reason to remove the sugested documentation changes, but they should be secondary to the more formal specification.
I don't know what you want to see on this front. The current PEP seems complete to me, except in relation to the specifics of the proxy behaviour, which it deliberately leaves underspecified (mostly because my current implementation sucks in a few areas, and I really don't want to enshrine those limitations in the language spec).
Resolving the issues with tracing mode behaviour ------------------------------------------------
According to this section the `f_locals` object (_PyFastLocalsProxy_Type in C) will have *two* fields. It should only have *one*, the pointer to the underlying frame object.
That's a good point - fast_refs doesn't reference the frame object, so it's safe to store it on the frame object and reference it from the individual proxies. Switching to that approach also has the advantage that every proxy after the first is much cheaper, since they don't need to build the fast refs map again.
In order to maximize backwards compatibility and, more importantly, avoid the synchronization issues that lead to obscure bugs, all the state of the `f_locals` object should be kept on the frame.
Any changes to `f_locals` should be instantly visible to any thread both through other `f_locals` objects (for the same frame) and the underlying local variable (if it exists).
[snip proxy method implementation sketch] This gets the gist of how the implementation works, but it isn't quite that simple for a few reasons: 1. The proxy needs to offer the same algorithmic complexity as dict APIs, so doing O(n) linear searches of C arrays and Python tuples inside O(1) lookup operations isn't OK (hence the fastrefs mapping to allow for O(1) resolution of variable names to cell references and local variable array offsets) 2. The proxy needs to deal with *SIX* kinds of possible key lookup, not two: * unbound local variable (numeric index in fast refs map, NULL in fast locals slot) * bound local variable (numeric index in fast refs map, object pointer in fast locals slot) * unbound cell variable (cell reference in fast refs map, cell contains NULL) * bound cell variable (cell reference in fast refs map, cell contains object pointer) * not yet converted cell variables (numeric index in fast refs map, possibly NULL initial value in fast locals slot) * extra keys injected via a fast locals proxy or PyEval_GetLocals() 3. PyEval_GetLocals() needs to work essentially the same way it used to (combined mapping containing the results of PyEval_FastToLocals() along with an extra key/value pairs, any extra key/value pairs written get includes in the result of future calls to locals(), but attempts to write to actual local and cell variables will probably get clobbered on the next refresh from the frame) The combination of those 3 requirements is what lead me down the path of using `f_locals` as a stateful cache that could be used both as the return value for PyEval_GetLocals(), and to satisfy those mapping API requirements that couldn't easily be met via the fast_refs mapping.
C API changes -------------
The PEP suggests adding four new functions to the stable API. Then in the "Changes to the public CPython C API" section, another five functions are added for a total of nine new functions!
At the Python level, there are two ways to access a locals mapping. 1. locals() 2. frame.f_locals
We only need two C functions, one for each of the above.
No, we don't. My original C API design was along those lines, and it was changed in early 2020 because it isn't adequate for C extension authors. The PEP explains this: Python code knows structurally how locals() is going to behave, but C extension code has no idea. Therefore, C extension code needs to be able to request the behaviour it needs, and let the implementation decide how best to satisfy that. https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the... summarises the outcome of that previous discussion.
`PyEval_GetLocals()` should be equivalent to `locals()`. It will have to return a new reference. It cannot return a borrowed reference as it would be returning freed memory. There is nothing to borrow the reference from, for a function scope.
Yes, which is why it can't do that: that API is already defined as returning a borrowed reference, and trying to change that would be a significant backwards compatibility break.
`PyFrame_GetLocals(PyFrameObject *)` should be equivalent to `frame.f_locals`.
The PEP doesn't explicitly state why `PyLocals_GetKind()` is needed, but I believe it is avoid unnecessary copying? Why is creating an extra copy an issue? It takes a microsecond or so to create the copy.
As described in the PEP, the query API is for the benefit of C extensions that want to do something different when PyLocals_Get() returns something other than a direct reference to the frame's local namespace. For example, it may try to get hold of the frame object to retrieve a read/write proxy from it, and throw a custom error if that isn't possible (as an implementation may support PyLocals_Get() without supporting the frame introspection APIs).
If a function that returns a copy of the local namespace is really needed, then why not offer that functionality directly? E.g. `PyFrame_GetLocalsCopy()` which is roughly:
PyObject *locals = PyEval_GetLocals(); if (current_scope_is_function()) return locals; return PyDict_Copy(locals); // This leaks, need to decref locals
The proposed API does offer that functionality directly: PyLocals_GetCopy() and PyFrame_GetLocalsCopy(f).
Reducing the runtime overhead of trace hooks --------------------------------------------
I'm confused by this part. Since `_PyFrame_FastToLocals` and friends do not alter the logical state of the frame object (or anything else), they should no-ops. It is `PyFrame_GetLocals()` that creates the proxy object, surely.
_PyFrame_FastToLocals() still refreshes the frame cache that is used as the return value from PyEval_GetLocals(), the same way it always has. The frame cache is implicitly kept in sync by operations on frame proxy objects, but there are still two ways for it to get out of sync: * code execution inside the frame (this updates the fast locals array, not the key/value cache stored in the C level f_locals attribute) * direct manipulation of the frame cache by callers of PyEval_GetLocals (the frame will ignore any attempts to bind or unbind variables that way, but it can still throw off proxy operations that rely on the cache) The reason the frame cache still exists in the first place is because there are a number of mapping APIs where the easiest way to meet the algorithmic complexity expectations of the dict API is to use an actual dict: = Length checking = The fast_refs mapping (and the frame metadata in general), includes all variables defined for the frame, regardless of whether they're currently bound or not. The correct length for the fast locals proxy mapping only includes the *currently bound* fast local and cell variables, together with any extra keys that have been added. The locals proxy keeps this an O(1) operation by reporting the size of the frame cache, so it doesn't have to check which names are currently bound every time. = Mapping comparisons = Mapping comparisons have an O(1) early exit if their lengths don't match, before falling back to a full O(N) key value comparison. The locals proxy takes advantage of that by delegating comparison operations to the frame cache rather than reimplementing them. = Iteration = Rather than reimplementing forward and reverse iteration and the keys(), values(), and items() methods, the mapping proxy just delegates these operations to the frame cache. Operations on the proxy other than these ones either don't rely on the cache being up to date at all (because they can use the fast_refs mapping instead, implicitly updating any affected cache entries along the way), or else implicitly refresh the cache before relying on it (e.g. copying the proxy mapping works by refreshing the frame cache, then copying the cache). I assume there will be ways to improve the implementation to make explicit frame cache refreshes less necessary over time (hence the caveat along those lines in the final paragraph of https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-addi... ), but the existence of the PyEval_GetLocals() API means that we're highly unlikely to ever get rid of the frame cache entirely. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia