[Python-Dev] [PEP 558] thinking through locals() semantics

Mon May 27 12:41:36 EDT 2019

On Mon, May 27, 2019 at 9:16 AM Guido van Rossum <guido at python.org> wrote:
>
> I re-ran your examples and found that some of them fail.
>
> On Mon, May 27, 2019 at 8:17 AM Nathaniel Smith <njs at pobox.com> wrote:
[...]
>> The interaction between f_locals and and locals() is also subtle:
>>
>>   def f():
>>       a = 1
>>       loc = locals()
>>       assert "loc" not in loc
>>       # Regular variable updates don't affect 'loc'
>>       a = 2
>>       assert loc["a"] == 1
>>       # But debugging updates do:
>>       sys._getframe().f_locals["a"] = 3
>>       assert a == 3
>
>
> That assert fails; `a` is still 2 here for me.

I think you're running on current Python, and I'm talking about the
semantics in the current PEP 558 draft, which redefines f_locals so
that the assert passes. Nick has a branch here if you want to try it:
https://github.com/python/cpython/pull/3640

(Though I admit I was lazy, and haven't tried running my examples at
all -- they're just based on the text.)

>>
>>       assert loc["a"] == 3
>>       # But it's not a full writeback
>>       assert "loc" not in loc
>>       # Mutating 'loc' doesn't affect f_locals:
>>       loc["a"] = 1
>>       assert sys._getframe().f_locals["a"] == 1
>>       # Except when it does:
>>       loc["b"] = 3
>>       assert sys._getframe().f_locals["b"] == 3
>
>
> All of this can be explained by realizing `loc is sys._getframe().f_locals`. IOW locals() always returns the dict in f_locals.

That's not true in the PEP version of things. locals() and
frame.f_locals become radically different. locals() is still a dict
stored in the frame object, but f_locals is a magic proxy object that
reads/writes to the fast locals array directly.

>>
>> Again, the results here are totally different if a Python-level
>> tracing/profiling function is installed.
>>
>> And you can also hit these subtleties via 'exec' and 'eval':
>>
>>   def f():
>>       a = 1
>>       loc = locals()
>>       assert "loc" not in loc
>>       # exec() triggers writeback, and then mutates the locals dict
>>       exec("a = 2; b = 3")
>>       # So now the current environment has been reflected into 'loc'
>>       assert "loc" in loc
>>       # Also loc["a"] has been changed to reflect the exec'ed assignments
>>       assert loc["a"] == 2
>>       # But if we look at the actual environment, directly or via
>>       # f_locals, we can see that 'a' has not changed:
>>       assert a == 1
>>       assert sys._getframe().f_locals["a"] == 1
>>       # loc["b"] changed as well:
>>       assert loc["b"] == 3
>>       # And this *does* show up in f_locals:
>>       assert sys._getframe().f_locals["b"] == 3
>
>
> This works indeed. My understanding is that the bytecode interpreter, when accessing the value of a local variable, ignores f_locals and always uses the "fast" array. But exec() and eval() don't use fast locals, their code is always compiled as if it appears in a module-level scope.
>
> While the interpreter is running and no debugger is active, in a function scope f_locals is not used at all, the interpreter only interacts with the fast array and the cells. It is initialized by the first locals() call for a function scope, and locals() copies the fast array and the cells into it. Subsequent calls in the same function scope keep the same value for f_locals and re-copy fast and cells into it. This also clears out deleted local variables and emptied cells, but leaves "strange" keys (like "b" in the examples) unchanged.
>
> The truly weird case happen when Python-level tracers are present, then the contents of f_locals is written back to the fast array and cells at certain points. This is intended for use by pdb (am I the only user of pdb left in the world?), so one can step through a function and mutate local variables. I find this essential in some cases.

Right, the original goal for the PEP was to remove the "truly weird
case" but keep pdb working

>>
>> Of course, many of these edge cases are pretty obscure, so it's not
>> clear how much they matter. But I think we can at least agree that
>> this isn't the one obvious way to do it :-).
>>
>>
>> ##### What's the landscape of possible semantics?
>>
>> I did some brainstorming, and came up with 4 sets of semantics that
>> seem plausible enough to at least consider:
>>
>> - [PEP]: the semantics in the current PEP draft.
>
>
> To be absolutely clear this copies the fast array and cells to f_locals when locals() is called, but never copies back, except when Python-level tracing/profiling is on.

In the PEP draft, it never copies back at all, under any circumstance.

>>
>> - [PEP-minus-tracing]: same as [PEP], except dropping the writeback on
>> Python-level trace/profile events.
>
>
> But this still copies the fast array and cells to f_locals when a Python trace function is called, right? It just doesn't write back.

No, when I say "writeback" in this email I always mean
PyFrame_FastToLocals. The PEP removes PyFrame_LocalsToFast entirely.

>> - [snapshot]: in function scope, each call to locals() returns a new,
>> *static* snapshot of the local environment, removing all this
>> writeback stuff. Something like:
>>
>>   def locals():
>>       frame = get_caller_frame()
>>       if is_function_scope(frame):
>>           # make a point-in-time copy of the "live" proxy object
>>           return dict(frame.f_locals)
>>       else:
>>           # in module/class scope, return the actual local environment
>>           return frame.f_locals
>
>
> This is the most extreme variant, and in this case there is no point in having f_locals at all for a function scope (since nothing uses it). I'm not 100% sure that you understand this.

Yes, this does suggest an optimization: you should be able to skip
allocating a dict for every frame in most cases. I'm not sure how much
of a difference it makes. In principle we could implement that
optimization right now by delaying the dict allocation until the first
time f_locals or locals() is used, but we currently don't bother. And
even if we adopt this, we'll still need to keep a slot in the frame
struct to allocate the dict if we need to, because people can still be
obnoxious and do frame.f_locals["unique never before seen name"] =
blah and expect to be able to read it back later, which means we need
somewhere to store that. (In fact Trio does do this right now, as part
of its control-C handling stuff, because there's literally no other
place where you can store information that a signal handler can see
when it's walking the stack.) We could deprecate writing new names to
f_locals like this, but that's a longer-term thing.

>>
>> - [proxy]: Simply return the .f_locals object, so in all contexts
>> locals() returns a live mutable view of the actual environment:
>>
>>   def locals():
>>       return get_caller_frame().f_locals
>
>
> So this is PEP without any writeback. But there is still copying from the fast array and cells to f_locals. Does that only happen when locals() is called? Or also when a Python-level trace/profiling function is called?
> My problem with all variants except what's in the PEP is that it would leave pdb *no* way (short of calling into the C API using ctypes) of writing back local variables.

No, this option is called [proxy] because this is the version where
locals() and f_locals *both* give you magic proxy objects where
__getitem__ and __setitem__ access the fast locals array directly, as
compared to the PEP where only f_locals gives you that magic object.

-n

--
Nathaniel J. Smith -- https://vorpus.org