[Python-Dev] Re: Comments on PEP 558

28 Jul 2021

      On Wed, 28 Jul 2021, 1:50 am Mark Shannon, <mark@hotpy.org> wrote:
...
Hi Nick,
On 27/07/2021 2:29 pm, Nick Coghlan wrote:
...
The reference documentation should be precise as well, since that is
what other implementations will be following.
What semantics do you feel are left unspecified?
...
The documentation needs to explain the behavior to the majority of
users, but the PEP should be worded so that it can be implemented
correctly from just reading the PEP.
There is no reason to remove the sugested documentation changes, but
they should be secondary to the more formal specification.
I don't know what you want to see on this front.
The current PEP seems complete to me, except in relation to the
specifics of the proxy behaviour, which it deliberately leaves
underspecified (mostly because my current implementation sucks in a
few areas, and I really don't want to enshrine those limitations in
the language spec).
All the above were mostly just suggestions, the format of the PEP is up
to you.
But please, do not underspecify anything. I need to maintain this, as do
the PyPy and MicroPython developers. We need this to be precise.
You still haven't said what you consider underspecified.

Perhaps read the implementation and tell me which parts of what the tests
are covering you want to see replicated in the PEP text?
...
...
...
Resolving the issues with tracing mode behaviour
------------------------------------------------
According to this section the `f_locals` object (_PyFastLocalsProxy_Type
in C) will have *two* fields.
It should only have *one*, the pointer to the underlying frame object.
That's a good point - fast_refs doesn't reference the frame object, so
it's safe to store it on the frame object and reference it from the
individual proxies.
Switching to that approach also has the advantage that every proxy
after the first is much cheaper, since they don't need to build the
fast refs map again.
...
In order to maximize backwards compatibility and, more importantly,
avoid the synchronization issues that lead to obscure bugs, all the
state of the `f_locals` object should be kept on the frame.
Any changes to `f_locals` should be instantly visible to any thread both
through other `f_locals` objects (for the same frame) and the underlying
local variable (if it exists).
[snip proxy method implementation sketch] >
This gets the gist of how the implementation works, but it isn't quite
that simple for a few reasons:
1. The proxy needs to offer the same algorithmic complexity as dict
APIs, so doing O(n) linear searches of C arrays and Python tuples
inside O(1) lookup operations isn't OK (hence the fastrefs mapping to
allow for O(1) resolution of variable names to cell references and
local variable array offsets)
Linear searches are faster for small arrays, and there is nothing
stopping you adding a helper map of names->indexes for large functions.
Provided the mapping from name to local index is O(1), then the whole
operation is O(1).
That helper map already exists in the fast refs mapping. Skipping it for
small functions would be a future optimisation opportunity that doesn't
seem necessary in the initial implementation.
...
You really don't need much extra machinery to maintain O(1) behavior
(which no one cares about, they just care about overall performance).
Yes, the only extra machinery needed for working with individual keys is
the fast refs mapping.

It's other aspects of the mapping API that benefit from the continuing
presence of the  frame cache (which is maintained on the frame, just as it
is today, NOT separately in each proxy).

If you can give me a reliable O(1) implementation of "len(frame.f_locals)"
that doesn't rely on up to date cache of:
* which local and cell variable names are currently bound to values
* which extra variables have been added via either proxies or
PyEval_GetLocals()
...
2. The proxy needs to deal with *SIX* kinds of possible key lookup, not
two:
...
* unbound local variable (numeric index in fast refs map, NULL in fast
locals slot)
* bound local variable (numeric index in fast refs map, object pointer
in fast locals slot)
* unbound cell variable (cell reference in fast refs map, cell contains
NULL)
...
* bound cell variable (cell reference in fast refs map, cell contains
object pointer)
* not yet converted cell variables (numeric index in fast refs map,
possibly NULL initial value in fast locals slot)
* extra keys injected via a fast locals proxy or PyEval_GetLocals()
There could be a million different cases internally, but that doesn't
need to reflected in the API.
And it isn't, *except* in the form of potential frame cache
desynchronisation.

Execution of the frame code can move local variables between the bound and
unbound states, and cell variables between the unconverted, unbound and
bound states. Cell variable unbinding doesn't even require that code in the
proxy execute - it can happen in other frames.

Actions through proxy objects will implicitly keep the cache up to date,
but mutating the result of PyEval_GetLocals() for keys that correspond to
local or cell variables will desynchronise it.

So the frame proxy implementation of len() has to choose between:
1. checking the binding state of every cell and local variable every time
it is called (making length reporting an O(N) operation); or
2. keeping a cache on the frame of currently live name bindings and
reporting the length of that (and offer ways to ensure the cache is up to
date)

Since it also solves the problem of what to return from PyEval_GetLocals()
without breaking backwards compatibility, I went with option 2.
...
...
The combination of those 3 requirements is what lead me down the path
of using `f_locals` as a stateful cache that could be used both as the
return value for PyEval_GetLocals(), and to satisfy those mapping API
requirements that couldn't easily be met via the fast_refs mapping.
Remove the state from the proxy and (almost) all of your problems go away.
The only state on the proxy is the fast refs mapping, and that's going to
move to the frame as you suggested. The frame cache is already on the frame
so PyEval_GetLocals() can return a borrowed reference to it.

As noted above, I think trying to sketch out an O(1) implementation of the
proxy's len() support that *doesn't* rely on a cache will provide the
clearest explanation for why I ended up keeping the frame cache around, and
just changed the way I described it (i.e. it's mostly referred to as a
cache now, and only occasionally still referred to as a dynamic snapshot).
...
...
The PEP explains this: Python code knows structurally how locals() is
going to behave, but C extension code has no idea. Therefore, C
extension code needs to be able to request the behaviour it needs, and
let the implementation decide how best to satisfy that.
Why do authors of C code have no idea what is going on, but authors of
Python code do? That seems like an odd assumption.
C API bindings can be called from anywhere, while actual Python code is
lexically scoped in the file where it is written.

The C API has plenty of functions for querying the state of the VM, if
...
that is what is needed.
The only query API I am aware of that allows a C extension to find out if
it is running in function scope is to get the frame object, then the code
object, then check if it is optimised.

PyLocals_GetKind() avoids assuming that frame and code objects are
available for external introspection.

What exactly are the requirements of C extension authors?
...
To know whether PyLocals_Get() returns a direct reference to the locals for
the running frame or not.
...
...
https://www.python.org/dev/peps/pep-0558/#proposing-several-additions-to-the...
...
summarises the outcome of that previous discussion.
This discussion seems to date from when the design was quite different,
is it relevant any more?
What makes you say that? If you think that section doesn't align with the
current design, then there's a significant miscommunication somewhere.
...
...
...
`PyEval_GetLocals()` should be equivalent to `locals()`.
It will have to return a new reference. It cannot return a borrowed
reference as it would be returning freed memory.
There is nothing to borrow the reference from, for a function scope.
Yes, which is why it can't do that: that API is already defined as
returning a borrowed reference, and trying to change that would be a
significant backwards compatibility break.
If you return a borrowed reference, you must borrow it from somewhere.
Where are you borrowing this reference from?
If you are borrowing from the frame, then the frame must have a strong
reference to the proxy, which creates a cycle.
I'm borrowing it from the same place it has always been borrowed from: the
f_locals slot on the frame.

That holds the frame cache, NOT the proxy, so there is no cycle.
...
...
...
`PyFrame_GetLocals(PyFrameObject *)` should be equivalent to
`frame.f_locals`.
...
The PEP doesn't explicitly state why `PyLocals_GetKind()` is needed, but
I believe it is avoid unnecessary copying?
Why is creating an extra copy an issue? It takes a microsecond or so to
create the copy.
As described in the PEP, the query API is for the benefit of C
extensions that want to do something different when PyLocals_Get()
returns something other than a direct reference to the frame's local
namespace.
For example, it may try to get hold of the frame object to retrieve a
read/write proxy from it, and throw a custom error if that isn't
possible (as an implementation may support PyLocals_Get() without
supporting the frame introspection APIs).
...
If a function that returns a copy of the local namespace is really
needed, then why not offer that functionality directly?
E.g. `PyFrame_GetLocalsCopy()` which is roughly:
PyObject *locals = PyEval_GetLocals();
      if (current_scope_is_function())
          return locals;
      return PyDict_Copy(locals); // This leaks, need to decref locals
The proposed API does offer that functionality directly:
PyLocals_GetCopy() and PyFrame_GetLocalsCopy(f).
Why two functions, not one?
One is in the stable ABI and doesn't require access to frame objects, but
only works for the running frame.

The other is in the CPython API and works for any frame, but requires
access to frame objects.
...
...
...
Reducing the runtime overhead of trace hooks
--------------------------------------------
I'm confused by this part.
Since `_PyFrame_FastToLocals` and friends do not alter the logical state
of the frame object (or anything else), they should no-ops.
It is `PyFrame_GetLocals()` that creates the proxy object, surely.
_PyFrame_FastToLocals() still refreshes the frame cache that is used
as the return value from PyEval_GetLocals(), the same way it always
has.
The frame cache is implicitly kept in sync by operations on frame
proxy objects, but there are still two ways for it to get out of sync:
* code execution inside the frame (this updates the fast locals array,
not the key/value cache stored in the C level f_locals attribute)
* direct manipulation of the frame cache by callers of
PyEval_GetLocals (the frame will ignore any attempts to bind or unbind
variables that way, but it can still throw off proxy operations that
rely on the cache)
Just remove the state in the proxy and nothing will ever be out of sync.
The frame cache is stored on the frame, not separately in each proxy.

If you were to eliminate it, the operations that need it would just need to
create it anew every time instead.
...
...
The reason the frame cache still exists in the first place is because
there are a number of mapping APIs where the easiest way to meet the
algorithmic complexity expectations of the dict API is to use an
actual dict:
= Length checking =
The fast_refs mapping (and the frame metadata in general), includes
all variables defined for the frame, regardless of whether they're
currently bound or not.
The correct length for the fast locals proxy mapping only includes the
*currently bound* fast local and cell variables, together with any
extra keys that have been added.
The locals proxy keeps this an O(1) operation by reporting the size of
the frame cache, so it doesn't have to check which names are currently
bound every time.
It may be O(1), but it is incorrect. You state earlier that the cache
might be out of date.
It is easy to implement anything in O(1) if it is allowed to be wrong ;)
Yes, I know. It's a trade-off, as most of the time both C and Python trace
functions will be able to force a cache refresh when they start (e.g. by
calling PyLocals_GetView() or retrieving f_locals from the Python level
frame object), and there won't be any Python code running, and nothing will
be messing with PyEval_GetLocals(), so the cache won't go out of sync.

The alternative is to NOT rely on the cache and instead check the state of
the bindings every time len() gets called, and every time some flavour of
iteration is needed.

That's trivial to implement as well (the intrinsically O(N) operations like
copying already work that way), it just makes nominally O(1) operations
unexpectedly O(N) in the number of local and cell variables on the frame.
...
...
= Mapping comparisons =
Mapping comparisons have an O(1) early exit if their lengths don't
match, before falling back to a full O(N) key value comparison. The
locals proxy takes advantage of that by delegating comparison
operations to the frame cache rather than reimplementing them.
= Iteration =
Rather than reimplementing forward and reverse iteration and the
keys(), values(), and items() methods, the mapping proxy just
delegates these operations to the frame cache.
Operations on the proxy other than these ones either don't rely on the
cache being up to date at all (because they can use the fast_refs
mapping instead, implicitly updating any affected cache entries along
the way), or else implicitly refresh the cache before relying on it
(e.g. copying the proxy mapping works by refreshing the frame cache,
then copying the cache).
I assume there will be ways to improve the implementation to make
explicit frame cache refreshes less necessary over time (hence the
caveat along those lines in the final paragraph of
https://www.python.org/dev/peps/pep-0558/#continuing-to-support-storing-addi...
...
), but the existence of the PyEval_GetLocals() API means that we're
highly unlikely to ever get rid of the frame cache entirely.
That's why PyEval_GetLocals() needs to return a new reference.
This is a chance to fix a fundamentally broken API, let's not waste the
opportunity.
You can't just change the refcounting semantics of existing APIs like that,
it's a major compatibility break.

PyLocals_Get() is the new API that offers the new locals() behaviour.

Cheers,
Nick.
...
Cheers,
Mark.