[Python-Dev] Idea: Dictionary references

Fri Dec 18 14:32:53 EST 2015

> On Dec 18, 2015, at 04:56, Steven D'Aprano <steve at pearwood.info> wrote:
> 
>>> On Thu, Dec 17, 2015 at 09:30:24AM -0800, Andrew Barnert via Python-Dev wrote:
>>> On Dec 17, 2015, at 07:38, Franklin? Lee <leewangzhong+python at gmail.com> wrote:
>>> 
>>> The nested dictionaries are only for nested scopes (and inner
>>> functions don't create nested scopes). Nested scopes will already
>>> require multiple lookups in parents.
>> 
>> I think I understand what you're getting at here, but it's a really 
>> confusing use of terminology. In Python, and in programming in 
>> general, nested scopes refer to exactly inner functions (and classes) 
>> being lexically nested and doing lookup through outer scopes. The fact 
>> that this is optimized at compile time to FAST vs. CELL vs. 
>> GLOBAL/NAME, cells are optimized at function-creation time, and only 
>> global and name have to be resolved at the last second doesn't mean 
>> that there's no scoping, or some other form of scoping besides 
>> lexical. The actual semantics are LEGB, even if L vs. E vs. GB and E 
>> vs. further-out E can be optimized.
> 
> In Python 2, the LOAD_NAME byte-code can return a local, even though it 
> normally doesn't:
> 
> py> x = "global"
> py> def spam():
> ...     exec "x = 'local'"
> ...     print x
> ...
> py> spam()
> local
> py> x == 'global'
> True
> 
> 
> If we look at the byte-code, we see that the lookup is *not* optimized 
> to inspect locals only (LOAD_FAST), but uses the regular LOAD_NAME that 
> normally gets used for globals and builtins:
> 
> py> import dis
> py> dis.dis(spam)
>  2           0 LOAD_CONST               1 ("x = 'local'")
>              3 LOAD_CONST               0 (None)
>              6 DUP_TOP
>              7 EXEC_STMT
> 
>  3           8 LOAD_NAME                0 (x)
>             11 PRINT_ITEM
>             12 PRINT_NEWLINE
>             13 LOAD_CONST               0 (None)
>             16 RETURN_VALUE
> 
> 
> 
>> What you're talking about here is global lookups falling back to 
>> builtin lookups. There's no more general notion of nesting or scoping 
>> involved, so why use those words?
> 
> I'm not quite sure about this. In principle, every name lookup looks in 
> four scopes, LEGB as you describe above:
> 
> - locals
> - non-locals, a.k.a. enclosing or lexical scope(s)
> - globals (i.e. the module)
> - builtins
> 
> 
> although Python can (usually?) optimise away some of those lookups.

I think it kind of _has_ to optimize away, or at least tweak, some of those things, rather than just acting as if globals and builtins were just two more enclosing scopes. For example, global to builtins has to go through globals()['__builtins__'], or act as if it does, or code that relies on, say, the documented behavior of exec can be broken. And you have to be able to modify the global scope after compile time and have that modification be effective, which means you'd have to allow the same things on locals and closures if they were to act the same.

> The 
> relationship of locals to enclosing scopes, and to globals in turn, 
> involve actual nesting of indented blocks in Python, but that's not 
> necessarily the case. One might imagine a hypothetical capability for 
> manipulating scopes directly, e.g.:
> 
> def spam(): ...
> def ham(): ...
> set_enclosing(ham, spam)
> # like:
> # def spam():
> #     def ham(): ...

But that doesn't work; a closure has to link to a particular invocation of its outer function, not just to the function. Consider a trivial example:

def spam(): x=time()
def ham(): return x
set_enclosing(ham, spam)
ham()

There's no actual x value in scope. So you need something like this if you want to actually be able to call it:

def spam(helper):
    x=time()
    helper = bind_closure(helper, sys._getframe())
    return helper()
def ham(): return x
set_enclosing(ham, spam)
spam(ham)

Of course you could make that getframe implicit; the point is there has to be a frame from an invocation of spam, not just the function itself, to make lexical scoping (errr... dynamically-generated fake-lexical scoping?) useful.

> The adventurous or fool-hardy can probably do something like that now 
> with byte-code hacking :-)

Yeah; I actually played with something like this a few years ago. I did it directly in terms of creating cell and free vars, not circumventing the existing LEGB system, which means you have to modify not just ham, but spam, in that set_enclosing. (In fact, you also have to modify all functions lexically or faux-lexically enclosing spam or enclosed by ham, which my code didn't do, but there were lots of other ways to fake it...). You need a bit of ctypes.pythonapi, not just bytecode hacks, to do the bind_closure() hack (the cell constructor isn't callable from Python, and you can't even fake it with a wrapper around a cell because cell_contents is immutable from Python...), but it's all doable. Anyway, my original goal was to make it possible to get the effect of nonlocal in Python 2, by calling "set_enclosing(spam, ham, force_cells=('eggs,')", which converted "eggs" to a freevar even if it was local (normally it only tried to convert globals), which mostly worked and only occasionally segfaulted. :)

At any rate, even in languages designed for this kind of hacking (like Scheme), the scopes are still described as nested scopes, and the intended behavior can even be defined in terms of "as if you took the sexpr/AST/whatever for ham and spam and nested and re-compiled them", so whatever hacks you actually do are just optimizations over that re-compile.

> Likewise, one might consider that builtins is a scope which in some 
> sense encloses the global scope. Consider it a virtual code block that 
> is outdented Sent from my iPhone
> 
> 
>> So, trying to generalize global vs. builtin to a general notion of 
>> "nested scope" that isn't necessary for builtins and doesn't work for 
>> anything else seems like overcomplicating things for no benefit.
> 
> Well, putting aside the question of whether this is useful or not, and 
> putting aside efficiency concerns, let's just imagine a hypothetical 
> implementation where name lookups used ChainMaps instead of using 
> separate LOAD_* lookups of special dicts. Then a function could set up a 
> ChainMap:

This is basically one of the two original ways of doing lexical scoping in Lisp: when a function is constructed, it stores a reference to the stack frame, and when that function is called, its frame stores a reference to the function, so you always have a linked chain of stack frames to walk through to do your lookups (and assignments).

The first problem with this is that using closures keeps alive a ton of garbage that can't be reclaimed for a long time. One solution to that is to lift out the variables, and only keep alive the ones that are actually referenced--but then you need some rule to decide variables are actually referenced, and the easiest place to do that is at function compile time. Which means that if you eval up new bindings or manipulate frame environments, they may or may not get closures, and it gets very confusing. It's simpler just to make them not work at all, at which point you've got pretty much the same rules cellvars have in Python.

But you don't want to apply those rules at global scope; you need to be able to build that scope iteratively. (Even a simple recursive function--or a function that references itself to call set_enclosing--needs to be able to defer looking for the name's scope until call time. Which, besides the trivial "making the basics work", allows Python to do all kinds of other fun stuff, like write a function that calls itself, then replace it with an optimized version, and now all existing invocations of that function--like, say, generators--recurse to the optimized version.) The traditional solution is to provide a way to declare two kinds of bindings: one, lexically scoped through the chain, and the other either flat-global or dynamically scoped. Then almost everything at the top level ends up global (or ends up don't-care) anyway. What Python does (special-case global lookup to be completely late and completely flat, assume the whole globals dict can live forever, and make top-level code actually use the globals dict as its locals) is just a simplification of that, which handles 99% of what you normally want.

Then you add a module system, and you effectively need two levels of global, and then you've got Python's behavior exactly, so you might as well optimize it the same way as Python. :)

But maybe if you went back and found a different solution to the first problem, you could come up with different results. If you start with unlinkable stacks instead of adding them on later (or decouple the environments from the stacks, as you already did, and make them unlinkable ChainMaps), maybe there's a better way.

Or maybe you could just decide that the garbage isn't a problem--or, rather, that it's an application-level problem; at compile time, users can just say "bind everything", but they can also provide a list of names to bind (and can also bind things by copy instead), so they have a way to manage the garbage growth. (This is C++'s solution to closures; I'm not sure how well it would transplant to Python, or Lisp, but it might work--maybe with hooks to do some things at runtime that C++ can only do at compile time?)

> function.__scopes__ = ChainMap(locals, enclosing, globals, builtins)
> 
> and a name lookup for (say) "x" would always be a simple:
> 
> function.__scopes__["x"]
> 
> Of course this would be harder to optimize, and hence probably slower, 
> than the current arrangement, but I think it would allow some 
> interesting experiments with scoping rules:
> 
> ChainMap(locals, enclosing, globals, application_globals, builtins)
> 
> 
> You could implement dynamic scoping by inserting the caller's __scopes__ 
> ChainMap into the front of the called function's ChainMap.

I'd think you'd normally want to dynamically scope on a per-variable basis, not a per-scope basis--but maybe that's because most languages that have optional dynamic scoping do it that way, so I never imagined what I could do with the other? Any cool ideas?

> And attribute 
> lookups would be something like this simplified scope:
> 
> ChainMap(self.__dict__, type(self).__dict__)

What about inheritance? You still need to get (base.__dict__ for base in type(self).__mro__) in there (and, for class attributes, the same for self.__mro__). But that would make things less dynamic (if a type's bases change at runtime, its subtypes aren't affected). You could link to your bases' ChainMaps instead of your entire MRO's dicts, but then I'm not sure how you linearize multiple inheritance. (Can you write a "C3ChainMap" that relies on a flag on each dict that says "if you see this flag you have to recompute the chain from your original inputs", which you set on __bases__ change?)

> to say nothing of combinations of the two.

You mean implicit self, where your chain the self and its type in there ahead of locals? The hard part there isn't lookups, but bindings. Unless you want a "self x" declaration akin to "nonlocal x" (or, if that's the default for methods, a "local x"--maybe implicit declarations are less important a feature than a single lookup chain?).

> So I think there's something interesting here, even if we don't want to 
> use it in production code, it would make for some nice experiments.

Most this seems like it would be easier to experiment with by building a new Python-like language than by hacking on Python. (Also, either way, it seems more like a thread for -ideas than -dev...)