
The problem of the 'default argument hack' and it's use for early binding and shared state in function definitions is one that has been bugging me for years. You could rightly say that the degree to which it irritates me is all out of proportion to the significance of the use case and frequency with which it arises, and I'd agree with you. That, at least in part, is what has made it such an interesting problem for me: the status quo is suboptimal and confusing (there's a reason the term 'default argument hack' gets thrown around, including by me), but most proposed solutions have involved fairly significant changes to the semantics and syntax of the language that cannot be justified by such a niche use case. The proposed cures (including my own suggestions) have all been worse than the disease. I finally have a possible answer I actually *like* ("nonlocal VAR from EXPR"), but it involves a somewhat novel way of thinking about closures and lexical scopes for it to make sense. This post is an attempt to explain that thought process. The history outlined here will be familiar to many folks on the list, but I hope to make the case that this approach actually simplifies and unifies a few aspects of the language rather than adding anything fundamentally new. The novel aspect lies in recognising and exposing to developers as a coherent feature something that is already implicit in the operation of the language as a whole. == Default Arguments == Default arguments have been a part of Python function definitions for a very long time (since the beginning, even?), so it makes sense to start with those. At function call time, if the relevant parameters are not supplied as arguments, they're populated on the frame object based on the values stored on the function object. Their behaviour is actually quite like a closure: they define shared state that is common to all invocations of the function. == Lexical Scoping == The second step in this journey is the original introduction of lexical scoping by PEP 227 back in Python 2.1 (or 2.2 without a __future__ statement). This changed Python from its original locals->globals->builtins lookup mechanism (still used in class scope to this day), to the closure semantics for nested functions that we're familiar with. However, at this stage, there was no ability to rebind names in outer scopes - they were read-only, so you needed to use other techniques (like 'boxing' in a list) to update immutable values. == Writing to Outer Scopes == PEP 3104 added the ability to write to outer scopes by using the 'nonlocal' statement to declare that a particular variable was not a local in the current frame, but rather a local in an outer frame which is alive at the time the inner function definition statement is executed. It expects the variable to already exist in an outer lexically nested scope and complains if it can't find one. == The "__class__" cell reference == The final entrant in this game, the "__class__" cell reference was added to the language by PEP 335 in order to implement the 3.x super() shorthand. For functions defined within a class body, this effectively lets the class definition play a role in lexical scoping, as the compiler and eval loop cooperate to give the function an indirect reference to the class being defined, even though the function definition completes first. == The Status Quo == If you go look up the definition of 'closure', you'll find that it doesn't actually say anything about nested functions. Instead, it will talk about 'free variables' in the algorithm definition without placing any restrictions on how those variables are later hooked up to the appropriate values. In current Python, ordinary named references can refer to one of 4 namespaces: - locals (stored on the currently executing frame object) - closure reference (stored in a cell object by the function that defined it, kept alive after the frame is recycled by references from still living inner functions that need it) - globals (stored on the module object) - builtins (also stored on a module object, specifically the one for the builtin namespace) PEP 335 also creates a closure reference for "__class__", but in a slightly unusual way. Whereas most targets for closure references are created by the code in the outer function when it runs [1], this closure reference is populated implicitly by the type machinery [2]. The important aspect from my point of view is that this implementation technique starts to break down Python's historical correlation between "function closure" and "lexically nested scope". == Conceptual Unification == The moment of clarity for me came when I realised that default arguments, lexically nested scopes and the new super() implementation can all be seen as just special cases of the broader concept of free variables and function closures. Lexically nested scopes have always been talked about in those terms, so that aspect shouldn't surprise anyone. The new super() implementation is also fairly obviously a closure, since it uses the closure machinery to work its magic. The only difference is in the way the value gets populated in the first place (i.e. by the type machinery rather than by the execution of an outer function). Due to history, default argument *values* aren't often thought of as closure references, but they really are anonymous closures. Instead of using cells, the references are stored in dedicated attributes that are known to the argument parsing machinery, but you could quite easily dispense with that and store everything as cells in the function closure (you wouldn't, since it would be a waste of time and energy, I'm just pointing out the conceptual equivalence. A *new* Python implementation, though, could choose to go down that path). After I had that realisation, the natural follow-up question seemed to be: if I wanted to explicitly declare a closure variable, and provide it with an initial value, without introducing a nested function purely for that purpose, how should I spell that? Well, I think PEP 3104 has already given us the answer: by declaring the variable name as explicitly 'nonlocal', but also providing an initial value so the compiler knows it is a *new* closure variable, rather than one from an outer lexically nested scope. This is a far more useful and meaningful addition than the trivial syntactic sugar mentioned in the PEP (but ultimately not implemented). The other question is what scope the initialisation operation should be executed in, and I think there, default arguments have the answer: in the containing scope, before the function has been defined. == Precise Syntax == By reusing 'nonlocal', we would make it clear that we're not adding a new concept to the language, but rather generalising an existing one (i.e. closure references) to provide additional flexibility in the way they're used. So I *really* want to use that keyword rather than adding a new one just for this task. However, I'm less certain about the spelling of the rest of the statement. There are at least a few possible alternative spellings: nonlocal VAR = EXPR # My initial suggestion nonlocal VAR from EXPR # Strongly indicates there's more than a simple assignment going on here nonlocal EXPR as VAR # Parser may struggle with this one Of the three, 'nonlocal VAR from EXPR' may be the best bet - it's easy for the compiler to parse, PEP 380 set the precedent for the 'from EXPR' clause to introduce a subexpression and 'nonlocal VAR = EXPR' may be too close to 'nonlocal VAR; VAR = EXPR'. Regards, Nick. [1] Some dis module details regarding the different kinds of name reference. Most notable for my point is the correspondence between the 'cell variable' in the outer function and the 'free variable' in the inner function:
def outer(): ... closure_ref = 1 ... def inner(): ... local_ref = 2 ... print(local_ref, closure_ref, global_ref, len) ... global_ref = 3 import dis dis.show_code(outer) Name: outer Filename: <stdin> Argument count: 0 Kw-only arguments: 0 Number of locals: 1 Stack size: 2 Flags: OPTIMIZED, NEWLOCALS Constants: 0: None 1: 1 2: <code object inner at 0xee78b0, file "<stdin>", line 3> Variable names: 0: inner Cell variables: 0: closure_ref dis.show_code(outer.__code__.co_consts[2]) Name: inner Filename: <stdin> Argument count: 0 Kw-only arguments: 0 Number of locals: 1 Stack size: 5 Flags: OPTIMIZED, NEWLOCALS, NESTED Constants: 0: None 1: 2 Names: 0: print 1: global_ref 2: len Variable names: 0: local_ref Free variables: 0: closure_ref
[2] Some dis module output to show that there's no corresponding '__class__' cell variable anywhere when the implicit closure entry is created by the new super() machinery.
def outer2(): ... class C: ... def inner(): ... print(__class__) ... dis.show_code(outer2) Name: outer2 Filename: <stdin> Argument count: 0 Kw-only arguments: 0 Number of locals: 1 Stack size: 3 Flags: OPTIMIZED, NEWLOCALS, NOFREE Constants: 0: None 1: <code object C at 0x1275c68, file "<stdin>", line 2> 2: 'C' Variable names: 0: C dis.show_code(outer2.__code__.co_consts[1]) Name: C Filename: <stdin> Argument count: 1 Kw-only arguments: 0 Number of locals: 1 Stack size: 2 Flags: NEWLOCALS Constants: 0: <code object inner at 0x1275608, file "<stdin>", line 3> Names: 0: __name__ 1: __module__ 2: inner Variable names: 0: __locals__ Cell variables: 0: __class__ dis.show_code(outer2.__code__.co_consts[1].co_consts[0]) Name: inner Filename: <stdin> Argument count: 0 Kw-only arguments: 0 Number of locals: 0 Stack size: 2 Flags: OPTIMIZED, NEWLOCALS, NESTED Constants: 0: None Names: 0: print Free variables: 0: __class__
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia