On Wed, 28 Jul 2021, 1:50 am Mark Shannon, <mark@hotpy.org> wrote:
Hi Nick,
On 27/07/2021 2:29 pm, Nick Coghlan wrote:
The reference documentation should be precise as well, since that is what other implementations will be following.
What semantics do you feel are left unspecified?
The documentation needs to explain the behavior to the majority of users, but the PEP should be worded so that it can be implemented correctly from just reading the PEP.
There is no reason to remove the sugested documentation changes, but they should be secondary to the more formal specification.
I don't know what you want to see on this front.
The current PEP seems complete to me, except in relation to the specifics of the proxy behaviour, which it deliberately leaves underspecified (mostly because my current implementation sucks in a few areas, and I really don't want to enshrine those limitations in the language spec).
All the above were mostly just suggestions, the format of the PEP is up to you.
But please, do not underspecify anything. I need to maintain this, as do the PyPy and MicroPython developers. We need this to be precise.
You still haven't said what you consider underspecified. Perhaps read the implementation and tell me which parts of what the tests are covering you want to see replicated in the PEP text?
Resolving the issues with tracing mode behaviour ------------------------------------------------
According to this section the `f_locals` object (_PyFastLocalsProxy_Type in C) will have *two* fields. It should only have *one*, the pointer to the underlying frame object.
That's a good point - fast_refs doesn't reference the frame object, so it's safe to store it on the frame object and reference it from the individual proxies.
Switching to that approach also has the advantage that every proxy after the first is much cheaper, since they don't need to build the fast refs map again.
In order to maximize backwards compatibility and, more importantly, avoid the synchronization issues that lead to obscure bugs, all the state of the `f_locals` object should be kept on the frame.
Any changes to `f_locals` should be instantly visible to any thread both through other `f_locals` objects (for the same frame) and the underlying local variable (if it exists).
[snip proxy method implementation sketch] > This gets the gist of how the implementation works, but it isn't quite that simple for a few reasons:
1. The proxy needs to offer the same algorithmic complexity as dict APIs, so doing O(n) linear searches of C arrays and Python tuples inside O(1) lookup operations isn't OK (hence the fastrefs mapping to allow for O(1) resolution of variable names to cell references and local variable array offsets)
Linear searches are faster for small arrays, and there is nothing stopping you adding a helper map of names->indexes for large functions. Provided the mapping from name to local index is O(1), then the whole operation is O(1).
That helper map already exists in the fast refs mapping. Skipping it for small functions would be a future optimisation opportunity that doesn't seem necessary in the initial implementation.
You really don't need much extra machinery to maintain O(1) behavior (which no one cares about, they just care about overall performance).
Yes, the only extra machinery needed for working with individual keys is the fast refs mapping. It's other aspects of the mapping API that benefit from the continuing presence of the frame cache (which is maintained on the frame, just as it is today, NOT separately in each proxy). If you can give me a reliable O(1) implementation of "len(frame.f_locals)" that doesn't rely on up to date cache of: * which local and cell variable names are currently bound to values * which extra variables have been added via either proxies or PyEval_GetLocals()
2. The proxy needs to deal with *SIX* kinds of possible key lookup, not two:
* unbound local variable (numeric index in fast refs map, NULL in fast locals slot) * bound local variable (numeric index in fast refs map, object pointer in fast locals slot) * unbound cell variable (cell reference in fast refs map, cell contains
NULL)
* bound cell variable (cell reference in fast refs map, cell contains object pointer) * not yet converted cell variables (numeric index in fast refs map, possibly NULL initial value in fast locals slot) * extra keys injected via a fast locals proxy or PyEval_GetLocals()
There could be a million different cases internally, but that doesn't need to reflected in the API.
And it isn't, *except* in the form of potential frame cache desynchronisation. Execution of the frame code can move local variables between the bound and unbound states, and cell variables between the unconverted, unbound and bound states. Cell variable unbinding doesn't even require that code in the proxy execute - it can happen in other frames. Actions through proxy objects will implicitly keep the cache up to date, but mutating the result of PyEval_GetLocals() for keys that correspond to local or cell variables will desynchronise it. So the frame proxy implementation of len() has to choose between: 1. checking the binding state of every cell and local variable every time it is called (making length reporting an O(N) operation); or 2. keeping a cache on the frame of currently live name bindings and reporting the length of that (and offer ways to ensure the cache is up to date) Since it also solves the problem of what to return from PyEval_GetLocals() without breaking backwards compatibility, I went with option 2.
The combination of those 3 requirements is what lead me down the path of using `f_locals` as a stateful cache that could be used both as the return value for PyEval_GetLocals(), and to satisfy those mapping API requirements that couldn't easily be met via the fast_refs mapping.
Remove the state from the proxy and (almost) all of your problems go away.
The only state on the proxy is the fast refs mapping, and that's going to move to the frame as you suggested. The frame cache is already on the frame so PyEval_GetLocals() can return a borrowed reference to it. As noted above, I think trying to sketch out an O(1) implementation of the proxy's len() support that *doesn't* rely on a cache will provide the clearest explanation for why I ended up keeping the frame cache around, and just changed the way I described it (i.e. it's mostly referred to as a cache now, and only occasionally still referred to as a dynamic snapshot).
The PEP explains this: Python code knows structurally how locals() is going to behave, but C extension code has no idea. Therefore, C extension code needs to be able to request the behaviour it needs, and let the implementation decide how best to satisfy that.
Why do authors of C code have no idea what is going on, but authors of Python code do? That seems like an odd assumption.
C API bindings can be called from anywhere, while actual Python code is lexically scoped in the file where it is written. The C API has plenty of functions for querying the state of the VM, if
that is what is needed.
The only query API I am aware of that allows a C extension to find out if it is running in function scope is to get the frame object, then the code object, then check if it is optimised. PyLocals_GetKind() avoids assuming that frame and code objects are available for external introspection. What exactly are the requirements of C extension authors?
To know whether PyLocals_Get() returns a direct reference to the locals for the running frame or not.
https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the...
summarises the outcome of that previous discussion.
This discussion seems to date from when the design was quite different, is it relevant any more?
What makes you say that? If you think that section doesn't align with the current design, then there's a significant miscommunication somewhere.
`PyEval_GetLocals()` should be equivalent to `locals()`. It will have to return a new reference. It cannot return a borrowed reference as it would be returning freed memory. There is nothing to borrow the reference from, for a function scope.
Yes, which is why it can't do that: that API is already defined as returning a borrowed reference, and trying to change that would be a significant backwards compatibility break.
If you return a borrowed reference, you must borrow it from somewhere. Where are you borrowing this reference from? If you are borrowing from the frame, then the frame must have a strong reference to the proxy, which creates a cycle.
I'm borrowing it from the same place it has always been borrowed from: the f_locals slot on the frame. That holds the frame cache, NOT the proxy, so there is no cycle.
`PyFrame_GetLocals(PyFrameObject *)` should be equivalent to `frame.f_locals`.
The PEP doesn't explicitly state why `PyLocals_GetKind()` is needed, but I believe it is avoid unnecessary copying? Why is creating an extra copy an issue? It takes a microsecond or so to create the copy.
As described in the PEP, the query API is for the benefit of C extensions that want to do something different when PyLocals_Get() returns something other than a direct reference to the frame's local namespace.
For example, it may try to get hold of the frame object to retrieve a read/write proxy from it, and throw a custom error if that isn't possible (as an implementation may support PyLocals_Get() without supporting the frame introspection APIs).
If a function that returns a copy of the local namespace is really needed, then why not offer that functionality directly? E.g. `PyFrame_GetLocalsCopy()` which is roughly:
PyObject *locals = PyEval_GetLocals(); if (current_scope_is_function()) return locals; return PyDict_Copy(locals); // This leaks, need to decref locals
The proposed API does offer that functionality directly: PyLocals_GetCopy() and PyFrame_GetLocalsCopy(f).
Why two functions, not one?
One is in the stable ABI and doesn't require access to frame objects, but only works for the running frame. The other is in the CPython API and works for any frame, but requires access to frame objects.
Reducing the runtime overhead of trace hooks --------------------------------------------
I'm confused by this part. Since `_PyFrame_FastToLocals` and friends do not alter the logical state of the frame object (or anything else), they should no-ops. It is `PyFrame_GetLocals()` that creates the proxy object, surely.
_PyFrame_FastToLocals() still refreshes the frame cache that is used as the return value from PyEval_GetLocals(), the same way it always has.
The frame cache is implicitly kept in sync by operations on frame proxy objects, but there are still two ways for it to get out of sync: * code execution inside the frame (this updates the fast locals array, not the key/value cache stored in the C level f_locals attribute) * direct manipulation of the frame cache by callers of PyEval_GetLocals (the frame will ignore any attempts to bind or unbind variables that way, but it can still throw off proxy operations that rely on the cache)
Just remove the state in the proxy and nothing will ever be out of sync.
The frame cache is stored on the frame, not separately in each proxy. If you were to eliminate it, the operations that need it would just need to create it anew every time instead.
The reason the frame cache still exists in the first place is because there are a number of mapping APIs where the easiest way to meet the algorithmic complexity expectations of the dict API is to use an actual dict:
= Length checking =
The fast_refs mapping (and the frame metadata in general), includes all variables defined for the frame, regardless of whether they're currently bound or not. The correct length for the fast locals proxy mapping only includes the *currently bound* fast local and cell variables, together with any extra keys that have been added. The locals proxy keeps this an O(1) operation by reporting the size of the frame cache, so it doesn't have to check which names are currently bound every time.
It may be O(1), but it is incorrect. You state earlier that the cache might be out of date. It is easy to implement anything in O(1) if it is allowed to be wrong ;)
Yes, I know. It's a trade-off, as most of the time both C and Python trace functions will be able to force a cache refresh when they start (e.g. by calling PyLocals_GetView() or retrieving f_locals from the Python level frame object), and there won't be any Python code running, and nothing will be messing with PyEval_GetLocals(), so the cache won't go out of sync. The alternative is to NOT rely on the cache and instead check the state of the bindings every time len() gets called, and every time some flavour of iteration is needed. That's trivial to implement as well (the intrinsically O(N) operations like copying already work that way), it just makes nominally O(1) operations unexpectedly O(N) in the number of local and cell variables on the frame.
= Mapping comparisons =
Mapping comparisons have an O(1) early exit if their lengths don't match, before falling back to a full O(N) key value comparison. The locals proxy takes advantage of that by delegating comparison operations to the frame cache rather than reimplementing them.
= Iteration =
Rather than reimplementing forward and reverse iteration and the keys(), values(), and items() methods, the mapping proxy just delegates these operations to the frame cache.
Operations on the proxy other than these ones either don't rely on the cache being up to date at all (because they can use the fast_refs mapping instead, implicitly updating any affected cache entries along the way), or else implicitly refresh the cache before relying on it (e.g. copying the proxy mapping works by refreshing the frame cache, then copying the cache).
I assume there will be ways to improve the implementation to make explicit frame cache refreshes less necessary over time (hence the caveat along those lines in the final paragraph of
https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-addi...
), but the existence of the PyEval_GetLocals() API means that we're highly unlikely to ever get rid of the frame cache entirely.
That's why PyEval_GetLocals() needs to return a new reference. This is a chance to fix a fundamentally broken API, let's not waste the opportunity.
You can't just change the refcounting semantics of existing APIs like that, it's a major compatibility break. PyLocals_Get() is the new API that offers the new locals() behaviour. Cheers, Nick.
Cheers, Mark.