[Python-ideas] Tweaking closures and lexical scoping to include the function being defined

Sept. 25, 2011

      The problem of the 'default argument hack' and it's use for early
binding and shared state in function definitions is one that has been
bugging me for years. You could rightly say that the degree to which
it irritates me is all out of proportion to the significance of the
use case and frequency with which it arises, and I'd agree with you.

That, at least in part, is what has made it such an interesting
problem for me: the status quo is suboptimal and confusing (there's a
reason the term 'default argument hack' gets thrown around, including
by me), but most proposed solutions have involved fairly significant
changes to the semantics and syntax of the language that cannot be
justified by such a niche use case. The proposed cures (including my
own suggestions) have all been worse than the disease.

I finally have a possible answer I actually *like* ("nonlocal VAR from
EXPR"), but it involves a somewhat novel way of thinking about
closures and lexical scopes for it to make sense. This post is an
attempt to explain that thought process. The history outlined here
will be familiar to many folks on the list, but I hope to make the
case that this approach actually simplifies and unifies a few aspects
of the language rather than adding anything fundamentally new. The
novel aspect lies in recognising and exposing to developers as a
coherent feature something that is already implicit in the operation
of the language as a whole.

== Default Arguments ==

Default arguments have been a part of Python function definitions for
a very long time (since the beginning, even?), so it makes sense to
start with those. At function call time, if the relevant parameters
are not supplied as arguments, they're populated on the frame object
based on the values stored on the function object. Their behaviour is
actually quite like a closure: they define shared state that is common
to all invocations of the function.

== Lexical Scoping ==

The second step in this journey is the original introduction of
lexical scoping by PEP 227 back in Python 2.1 (or 2.2 without a
__future__ statement). This changed Python from its original
locals->globals->builtins lookup mechanism (still used in class scope
to this day), to the closure semantics for nested functions that we're
familiar with. However, at this stage, there was no ability to rebind
names in outer scopes - they were read-only, so you needed to use
other techniques (like 'boxing' in a list) to update immutable values.

== Writing to Outer Scopes ==

PEP 3104 added the ability to write to outer scopes by using the
'nonlocal' statement to declare that a particular variable was not a
local in the current frame, but rather a local in an outer frame which
is alive at the time the inner function definition statement is
executed. It expects the variable to already exist in an outer
lexically nested scope and complains if it can't find one.

== The "__class__" cell reference ==

The final entrant in this game, the "__class__" cell reference was
added to the language by PEP 335 in order to implement the 3.x super()
shorthand. For functions defined within a class body, this effectively
lets the class definition play a role in lexical scoping, as the
compiler and eval loop cooperate to give the function an indirect
reference to the class being defined, even though the function
definition completes first.

== The Status Quo ==

If you go look up the definition of 'closure', you'll find that it
doesn't actually say anything about nested functions. Instead, it will
talk about 'free variables' in the algorithm definition without
placing any restrictions on how those variables are later hooked up to
the appropriate values.

In current Python, ordinary named references can refer to one of 4 namespaces:

- locals (stored on the currently executing frame object)
- closure reference (stored in a cell object by the function that
defined it, kept alive after the frame is recycled by references from
still living inner functions that need it)
- globals (stored on the module object)
- builtins (also stored on a module object, specifically the one for
the builtin namespace)

PEP 335 also creates a closure reference for "__class__", but in a
slightly unusual way. Whereas most targets for closure references are
created by the code in the outer function when it runs [1], this
closure reference is populated implicitly by the type machinery [2].
The important aspect from my point of view is that this implementation
technique starts to break down Python's historical correlation between
"function closure" and "lexically nested scope".

== Conceptual Unification ==

The moment of clarity for me came when I realised that default
arguments, lexically nested scopes and the new super() implementation
can all be seen as just special cases of the broader concept of free
variables and function closures.

Lexically nested scopes have always been talked about in those terms,
so that aspect shouldn't surprise anyone. The new super()
implementation is also fairly obviously a closure, since it uses the
closure machinery to work its magic. The only difference is in the way
the value gets populated in the first place (i.e. by the type
machinery rather than by the execution of an outer function).

Due to history, default argument *values* aren't often thought of as
closure references, but they really are anonymous closures. Instead of
using cells, the references are stored in dedicated attributes that
are known to the argument parsing machinery, but you could quite
easily dispense with that and store everything as cells in the
function closure (you wouldn't, since it would be a waste of time and
energy, I'm just pointing out the conceptual equivalence. A *new*
Python implementation, though, could choose to go down that path).

After I had that realisation, the natural follow-up question seemed to
be: if I wanted to explicitly declare a closure variable, and provide
it with an initial value, without introducing a nested function purely
for that purpose, how should I spell that?

Well, I think PEP 3104 has already given us the answer: by declaring
the variable name as explicitly 'nonlocal', but also providing an
initial value so the compiler knows it is a *new* closure variable,
rather than one from an outer lexically nested scope. This is a far
more useful and meaningful addition than the trivial syntactic sugar
mentioned in the PEP (but ultimately not implemented).

The other question is what scope the initialisation operation should
be executed in, and I think there, default arguments have the answer:
in the containing scope, before the function has been defined.

== Precise Syntax ==

By reusing 'nonlocal', we would make it clear that we're not adding a
new concept to the language, but rather generalising an existing one
(i.e. closure references) to provide additional flexibility in the way
they're used. So I *really* want to use that keyword rather than
adding a new one just for this task. However, I'm less certain about
the spelling of the rest of the statement.

There are at least a few possible alternative spellings:

  nonlocal VAR = EXPR    # My initial suggestion
  nonlocal VAR from EXPR    # Strongly indicates there's more than a
simple assignment going on here
  nonlocal EXPR as VAR    # Parser may struggle with this one

Of the three, 'nonlocal VAR from EXPR' may be the best bet - it's easy
for the compiler to parse, PEP 380 set the precedent for the 'from
EXPR' clause to introduce a subexpression and 'nonlocal VAR = EXPR'
may  be too close to 'nonlocal VAR; VAR = EXPR'.

Regards,
Nick.

[1]
Some dis module details regarding the different kinds of name
reference. Most notable for my point is the correspondence between the
'cell variable' in the outer function and the 'free variable' in the
inner function:
...
...
...
def outer():
...   closure_ref = 1
...   def inner():
...     local_ref = 2
...     print(local_ref, closure_ref, global_ref, len)
...
global_ref = 3
import dis
dis.show_code(outer)
Name:              outer
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  1
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS
Constants:
   0: None
   1: 1
   2: <code object inner at 0xee78b0, file "<stdin>", line 3>
Variable names:
   0: inner
Cell variables:
   0: closure_ref
dis.show_code(outer.__code__.co_consts[2])
Name:              inner
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  1
Stack size:        5
Flags:             OPTIMIZED, NEWLOCALS, NESTED
Constants:
   0: None
   1: 2
Names:
   0: print
   1: global_ref
   2: len
Variable names:
   0: local_ref
Free variables:
   0: closure_ref
[2]
Some dis module output to show that there's no corresponding
'__class__' cell variable anywhere when the implicit closure entry is
created by the new super() machinery.
...
...
...
def outer2():
...     class C:
...         def inner():
...             print(__class__)
...
dis.show_code(outer2)
Name:              outer2
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  1
Stack size:        3
Flags:             OPTIMIZED, NEWLOCALS, NOFREE
Constants:
   0: None
   1: <code object C at 0x1275c68, file "<stdin>", line 2>
   2: 'C'
Variable names:
   0: C
dis.show_code(outer2.__code__.co_consts[1])
Name:              C
Filename:          <stdin>
Argument count:    1
Kw-only arguments: 0
Number of locals:  1
Stack size:        2
Flags:             NEWLOCALS
Constants:
   0: <code object inner at 0x1275608, file "<stdin>", line 3>
Names:
   0: __name__
   1: __module__
   2: inner
Variable names:
   0: __locals__
Cell variables:
   0: __class__
dis.show_code(outer2.__code__.co_consts[1].co_consts[0])
Name:              inner
Filename:          <stdin>
Argument count:    0
Kw-only arguments: 0
Number of locals:  0
Stack size:        2
Flags:             OPTIMIZED, NEWLOCALS, NESTED
Constants:
   0: None
Names:
   0: print
Free variables:
   0: __class__
-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia