[Python-Dev] PEP 227 (was Re: Nested scopes resolution -- you can breathe again!)

Jeremy Hylton jeremy@alum.mit.edu
Wed, 28 Feb 2001 12:58:58 -0500 (EST)


Last week Guido sent a message about our decisions to make nested
scopes an optional feature for 2.1 in advance of their mandatory
introduction in Python 2.2.

I've included an ndiff of the PEP for reference.  The beta release on
Friday will contain the features as described in the PEP.

Jeremy

-: old-pep-0227.txt
+: pep-0227.txt
  PEP: 227
  Title: Statically Nested Scopes
- Version: $Revision: 1.6 $
?                       ^   
+ Version: $Revision: 1.7 $
?                       ^   
  Author: jeremy@digicool.com (Jeremy Hylton)
  Status: Draft
  Type: Standards Track
  Python-Version: 2.1
  Created: 01-Nov-2000
  Post-History:
  
  Abstract
  
      This PEP proposes the addition of statically nested scoping
      (lexical scoping) for Python 2.1.  The current language definition
      defines exactly three namespaces that are used to resolve names --
      the local, global, and built-in namespaces.  The addition of
      nested scopes would allow resolution of unbound local names in
      enclosing functions' namespaces.
  
      One consequence of this change that will be most visible to Python
      programs is that lambda statements could reference variables in
      the namespaces where the lambda is defined.  Currently, a lambda
      statement uses default arguments to explicitly creating bindings
      in the lambda's namespace.
  
  Introduction
  
      This proposal changes the rules for resolving free variables in
-     Python functions.  The Python 2.0 definition specifies exactly
-     three namespaces to check for each name -- the local namespace,
-     the global namespace, and the builtin namespace.  According to
-     this defintion, if a function A is defined within a function B,
-     the names bound in B are not visible in A.  The proposal changes
-     the rules so that names bound in B are visible in A (unless A
+     Python functions.  The new name resolution semantics will take
+     effect with Python 2.2.  These semantics will also be available in
+     Python 2.1 by adding "from __future__ import nested_scopes" to the
+     top of a module.
+ 
+     The Python 2.0 definition specifies exactly three namespaces to
+     check for each name -- the local namespace, the global namespace,
+     and the builtin namespace.  According to this definition, if a
+     function A is defined within a function B, the names bound in B
+     are not visible in A.  The proposal changes the rules so that
+     names bound in B are visible in A (unless A contains a name
-     contains a name binding that hides the binding in B).
?    ----------------                                       
+     binding that hides the binding in B).
  
      The specification introduces rules for lexical scoping that are
      common in Algol-like languages.  The combination of lexical
      scoping and existing support for first-class functions is
      reminiscent of Scheme.
  
      The changed scoping rules address two problems -- the limited
-     utility of lambda statements and the frequent confusion of new
+     utility of lagmbda statements and the frequent confusion of new
?                  +                                                  
      users familiar with other languages that support lexical scoping,
      e.g. the inability to define recursive functions except at the
      module level.
+ 
+ XXX Konrad Hinsen suggests that this section be expanded
  
      The lambda statement introduces an unnamed function that contains
      a single statement.  It is often used for callback functions.  In
      the example below (written using the Python 2.0 rules), any name
      used in the body of the lambda must be explicitly passed as a
      default argument to the lambda.
  
      from Tkinter import *
      root = Tk()
      Button(root, text="Click here",
             command=lambda root=root: root.test.configure(text="..."))
  
      This approach is cumbersome, particularly when there are several
      names used in the body of the lambda.  The long list of default
      arguments obscure the purpose of the code.  The proposed solution,
      in crude terms, implements the default argument approach
      automatically.  The "root=root" argument can be omitted.
  
+     The new name resolution semantics will cause some programs to
+     behave differently than they did under Python 2.0.  In some cases,
+     programs will fail to compile.  In other cases, names that were
+     previously resolved using the global namespace will be resolved
+     using the local namespace of an enclosing function.  In Python
+     2.1, warnings will be issued for all program statement that will
+     behave differently.
+ 
  Specification
  
      Python is a statically scoped language with block structure, in
      the traditional of Algol.  A code block or region, such as a
-     module, class defintion, or function body, is the basic unit of a
+     module, class definition, or function body, is the basic unit of a
?                        +                                               
      program.
  
      Names refer to objects.  Names are introduced by name binding
      operations.  Each occurrence of a name in the program text refers
      to the binding of that name established in the innermost function
      block containing the use.
  
      The name binding operations are assignment, class and function
      definition, and import statements.  Each assignment or import
      statement occurs within a block defined by a class or function
      definition or at the module level (the top-level code block).
  
      If a name binding operation occurs anywhere within a code block,
      all uses of the name within the block are treated as references to
      the current block.  (Note: This can lead to errors when a name is
      used within a block before it is bound.)
  
      If the global statement occurs within a block, all uses of the
      name specified in the statement refer to the binding of that name
      in the top-level namespace.  Names are resolved in the top-level
      namespace by searching the global namespace, the namespace of the
      module containing the code block, and the builtin namespace, the
      namespace of the module __builtin__.  The global namespace is
      searched first.  If the name is not found there, the builtin
-     namespace is searched.
+     namespace is searched.  The global statement must precede all uses
+     of the name.
  
      If a name is used within a code block, but it is not bound there
      and is not declared global, the use is treated as a reference to
      the nearest enclosing function region.  (Note: If a region is
      contained within a class definition, the name bindings that occur
      in the class block are not visible to enclosed functions.)
  
      A class definition is an executable statement that may uses and
      definitions of names.  These references follow the normal rules
      for name resolution.  The namespace of the class definition
      becomes the attribute dictionary of the class.
  
      The following operations are name binding operations.  If they
      occur within a block, they introduce new local names in the
      current block unless there is also a global declaration.
  
-     Function defintion: def name ...
+     Function definition: def name ...
?                   +                   
      Class definition: class name ...
      Assignment statement: name = ...    
      Import statement: import name, import module as name,
          from module import name
      Implicit assignment: names are bound by for statements and except
          clauses
  
      The arguments of a function are also local.
  
      There are several cases where Python statements are illegal when
      used in conjunction with nested scopes that contain free
      variables.
  
      If a variable is referenced in an enclosing scope, it is an error
      to delete the name.  The compiler will raise a SyntaxError for
      'del name'.
  
-     If the wildcard form of import (import *) is used in a function
+     If the wild card form of import (import *) is used in a function
?                +                                                     
      and the function contains a nested block with free variables, the
      compiler will raise a SyntaxError.
  
      If exec is used in a function and the function contains a nested
      block with free variables, the compiler will raise a SyntaxError
-     unless the exec explicit specifies the local namespace for the
+     unless the exec explicitly specifies the local namespace for the
?                             ++                                       
      exec.  (In other words, "exec obj" would be illegal, but 
      "exec obj in ns" would be legal.)
  
+     If a name bound in a function scope is also the name of a module
+     global name or a standard builtin name and the function contains a
+     nested function scope that references the name, the compiler will
+     issue a warning.  The name resolution rules will result in
+     different bindings under Python 2.0 than under Python 2.2.  The
+     warning indicates that the program may not run correctly with all
+     versions of Python.
+ 
  Discussion
  
      The specified rules allow names defined in a function to be
      referenced in any nested function defined with that function.  The
      name resolution rules are typical for statically scoped languages,
      with three primary exceptions:
  
          - Names in class scope are not accessible.
          - The global statement short-circuits the normal rules.
          - Variables are not declared.
  
      Names in class scope are not accessible.  Names are resolved in
-     the innermost enclosing function scope.  If a class defintion
+     the innermost enclosing function scope.  If a class definition
?                                                              +     
      occurs in a chain of nested scopes, the resolution process skips
      class definitions.  This rule prevents odd interactions between
      class attributes and local variable access.  If a name binding
-     operation occurs in a class defintion, it creates an attribute on
+     operation occurs in a class definition, it creates an attribute on
?                                      +                                 
      the resulting class object.  To access this variable in a method,
      or in a function nested within a method, an attribute reference
      must be used, either via self or via the class name.
  
      An alternative would have been to allow name binding in class
      scope to behave exactly like name binding in function scope.  This
      rule would allow class attributes to be referenced either via
      attribute reference or simple name.  This option was ruled out
      because it would have been inconsistent with all other forms of
      class and instance attribute access, which always use attribute
      references.  Code that used simple names would have been obscure.
  
      The global statement short-circuits the normal rules.  Under the
      proposal, the global statement has exactly the same effect that it
-     does for Python 2.0.  It's behavior is preserved for backwards
?                             -                                      
+     does for Python 2.0.  Its behavior is preserved for backwards
      compatibility.  It is also noteworthy because it allows name
      binding operations performed in one block to change bindings in
      another block (the module).
  
      Variables are not declared.  If a name binding operation occurs
      anywhere in a function, then that name is treated as local to the
      function and all references refer to the local binding.  If a
      reference occurs before the name is bound, a NameError is raised.
      The only kind of declaration is the global statement, which allows
      programs to be written using mutable global variables.  As a
      consequence, it is not possible to rebind a name defined in an
      enclosing scope.  An assignment operation can only bind a name in
      the current scope or in the global scope.  The lack of
      declarations and the inability to rebind names in enclosing scopes
      are unusual for lexically scoped languages; there is typically a
      mechanism to create name bindings (e.g. lambda and let in Scheme)
      and a mechanism to change the bindings (set! in Scheme).
  
      XXX Alex Martelli suggests comparison with Java, which does not
      allow name bindings to hide earlier bindings.  
  
  Examples
  
      A few examples are included to illustrate the way the rules work.
  
      XXX Explain the examples
  
      >>> def make_adder(base):
      ...     def adder(x):
      ...         return base + x
      ...     return adder
      >>> add5 = make_adder(5)
      >>> add5(6)
      11
  
      >>> def make_fact():
      ...     def fact(n):
      ...         if n == 1:
      ...             return 1L
      ...         else:
      ...             return n * fact(n - 1)
      ...     return fact
      >>> fact = make_fact()
      >>> fact(7)    
      5040L
  
      >>> def make_wrapper(obj):
      ...     class Wrapper:
      ...         def __getattr__(self, attr):
      ...             if attr[0] != '_':
      ...                 return getattr(obj, attr)
      ...             else:
      ...                 raise AttributeError, attr
      ...     return Wrapper()
      >>> class Test:
      ...     public = 2
      ...     _private = 3
      >>> w = make_wrapper(Test())
      >>> w.public
      2
      >>> w._private
      Traceback (most recent call last):
        File "<stdin>", line 1, in ?
      AttributeError: _private
  
-     An example from Tim Peters of the potential pitfalls of nested scopes
?                                 ^                          -------------- 
+     An example from Tim Peters demonstrates the potential pitfalls of
?                                +++ ^^^^^^^^                           
-     in the absence of declarations:
+     nested scopes in the absence of declarations:
?    ++++++++++++++                                 
  
      i = 6
      def f(x):
          def g():
              print i
          # ...
          # skip to the next page
          # ...
          for i in x:  # ah, i *is* local to f, so this is what g sees
              pass
          g()
  
      The call to g() will refer to the variable i bound in f() by the for
      loop.  If g() is called before the loop is executed, a NameError will
      be raised.
  
      XXX need some counterexamples
  
  Backwards compatibility
  
      There are two kinds of compatibility problems caused by nested
      scopes.  In one case, code that behaved one way in earlier
-     versions, behaves differently because of nested scopes.  In the
?             -                                                       
+     versions behaves differently because of nested scopes.  In the
      other cases, certain constructs interact badly with nested scopes
      and will trigger SyntaxErrors at compile time.
  
      The following example from Skip Montanaro illustrates the first
      kind of problem:
  
      x = 1
      def f1():
          x = 2
          def inner():
              print x
          inner()
  
      Under the Python 2.0 rules, the print statement inside inner()
      refers to the global variable x and will print 1 if f1() is
      called.  Under the new rules, it refers to the f1()'s namespace,
      the nearest enclosing scope with a binding.
  
      The problem occurs only when a global variable and a local
      variable share the same name and a nested function uses that name
      to refer to the global variable.  This is poor programming
      practice, because readers will easily confuse the two different
      variables.  One example of this problem was found in the Python
      standard library during the implementation of nested scopes.
  
      To address this problem, which is unlikely to occur often, a
      static analysis tool that detects affected code will be written.
-     The detection problem is straightfoward.
+     The detection problem is straightforward.
?                                        +      
  
-     The other compatibility problem is casued by the use of 'import *'
?                                           -                            
+     The other compatibility problem is caused by the use of 'import *'
?                                          +                             
      and 'exec' in a function body, when that function contains a
      nested scope and the contained scope has free variables.  For
      example:
  
      y = 1
      def f():
          exec "y = 'gotcha'" # or from module import *
          def g():
              return y
          ...
  
      At compile-time, the compiler cannot tell whether an exec that
-     operators on the local namespace or an import * will introduce
?           ^^                                                       
+     operates on the local namespace or an import * will introduce
?           ^                                                       
      name bindings that shadow the global y.  Thus, it is not possible
      to tell whether the reference to y in g() should refer to the
      global or to a local name in f().
  
      In discussion of the python-list, people argued for both possible
      interpretations.  On the one hand, some thought that the reference
      in g() should be bound to a local y if one exists.  One problem
      with this interpretation is that it is impossible for a human
      reader of the code to determine the binding of y by local
      inspection.  It seems likely to introduce subtle bugs.  The other
      interpretation is to treat exec and import * as dynamic features
      that do not effect static scoping.  Under this interpretation, the
      exec and import * would introduce local names, but those names
      would never be visible to nested scopes.  In the specific example
      above, the code would behave exactly as it did in earlier versions
      of Python.
  
-     Since each interpretation is problemtatic and the exact meaning
?                                         -                           
+     Since each interpretation is problematic and the exact meaning
      ambiguous, the compiler raises an exception.
  
      A brief review of three Python projects (the standard library,
      Zope, and a beta version of PyXPCOM) found four backwards
      compatibility issues in approximately 200,000 lines of code.
      There was one example of case #1 (subtle behavior change) and two
      examples of import * problems in the standard library.
  
      (The interpretation of the import * and exec restriction that was
      implemented in Python 2.1a2 was much more restrictive, based on
      language that in the reference manual that had never been
      enforced.  These restrictions were relaxed following the release.)
  
+ Compatibility of C API
+ 
+     The implementation causes several Python C API functions to
+     change, including PyCode_New().  As a result, C extensions may
+     need to be updated to work correctly with Python 2.1.  
+ 
  locals() / vars()
  
      These functions return a dictionary containing the current scope's
      local variables.  Modifications to the dictionary do not affect
      the values of variables.  Under the current rules, the use of
      locals() and globals() allows the program to gain access to all
      the namespaces in which names are resolved.
  
      An analogous function will not be provided for nested scopes.
      Under this proposal, it will not be possible to gain
      dictionary-style access to all visible scopes.
  
+ Warnings and Errors
+ 
+     The compiler will issue warnings in Python 2.1 to help identify
+     programs that may not compile or run correctly under future
+     versions of Python.  Under Python 2.2 or Python 2.1 if the
+     nested_scopes future statement is used, which are collectively
+     referred to as "future semantics" in this section, the compiler
+     will issue SyntaxErrors in some cases.
+ 
+     The warnings typically apply when a function that contains a
+     nested function that has free variables.  For example, if function
+     F contains a function G and G uses the builtin len(), then F is a
+     function that contains a nested function (G) with a free variable
+     (len).  The label "free-in-nested" will be used to describe these
+     functions. 
+ 
+     import * used in function scope
+ 
+         The language reference specifies that import * may only occur
+         in a module scope.  (Sec. 6.11)  The implementation of C
+         Python has supported import * at the function scope.
+ 
+         If import * is used in the body of a free-in-nested function,
+         the compiler will issue a warning.  Under future semantics,
+         the compiler will raise a SyntaxError.
+ 
+     bare exec in function scope
+ 
+         The exec statement allows two optional expressions following
+         the keyword "in" that specify the namespaces used for locals
+         and globals.  An exec statement that omits both of these
+         namespaces is a bare exec.
+ 
+         If a bare exec is used in the body of a free-in-nested
+         function, the compiler will issue a warning.  Under future
+         semantics, the compiler will raise a SyntaxError.
+ 
+     local shadows global
+ 
+         If a free-in-nested function has a binding for a local
+         variable that (1) is used in a nested function and (2) is the
+         same as a global variable, the compiler will issue a warning.
+ 
  Rebinding names in enclosing scopes
  
      There are technical issues that make it difficult to support
      rebinding of names in enclosing scopes, but the primary reason
      that it is not allowed in the current proposal is that Guido is
      opposed to it.  It is difficult to support, because it would
      require a new mechanism that would allow the programmer to specify
      that an assignment in a block is supposed to rebind the name in an
      enclosing block; presumably a keyword or special syntax (x := 3)
      would make this possible.
  
      The proposed rules allow programmers to achieve the effect of
      rebinding, albeit awkwardly.  The name that will be effectively
      rebound by enclosed functions is bound to a container object.  In
      place of assignment, the program uses modification of the
      container to achieve the desired effect:
  
      def bank_account(initial_balance):
          balance = [initial_balance]
          def deposit(amount):
              balance[0] = balance[0] + amount
              return balance
          def withdraw(amount):
              balance[0] = balance[0] - amount
              return balance
          return deposit, withdraw
  
      Support for rebinding in nested scopes would make this code
      clearer.  A class that defines deposit() and withdraw() methods
      and the balance as an instance variable would be clearer still.
      Since classes seem to achieve the same effect in a more
      straightforward manner, they are preferred.
  
  Implementation
  
      The implementation for C Python uses flat closures [1].  Each def
      or lambda statement that is executed will create a closure if the
      body of the function or any contained function has free
      variables.  Using flat closures, the creation of closures is
      somewhat expensive but lookup is cheap.
  
      The implementation adds several new opcodes and two new kinds of
      names in code objects.  A variable can be either a cell variable
      or a free variable for a particular code object.  A cell variable
      is referenced by containing scopes; as a result, the function
      where it is defined must allocate separate storage for it on each
-     invocation.  A free variable is reference via a function's closure.
?                                                               --------- 
+     invocation.  A free variable is referenced via a function's
?                                              +                  
+     closure. 
+ 
+     The choice of free closures was made based on three factors.
+     First, nested functions are presumed to be used infrequently,
+     deeply nested (several levels of nesting) still less frequently.
+     Second, lookup of names in a nested scope should be fast.
+     Third, the use of nested scopes, particularly where a function
+     that access an enclosing scope is returned, should not prevent
+     unreferenced objects from being reclaimed by the garbage
+     collector. 
  
      XXX Much more to say here
  
  References
  
      [1] Luca Cardelli.  Compiling a functional language.  In Proc. of
      the 1984 ACM Conference on Lisp and Functional Programming,
      pp. 208-217, Aug. 1984
          http://citeseer.nj.nec.com/cardelli84compiling.html