[Python-Dev] Switch statement

Sat Jun 24 12:31:45 CEST 2006

The current train of thought seems to be to handle a switch statement as follows:

   1. Define switch explicitly as a hash table lookup, with the hash table 
built at function definition time

   2. Allow expressions to be flagged as 'static' to request evaluation at 
def-time

   3. Have the expressions in a case clause be implicitly flagged as static

   4. Allow 'case in' to be used to indicate that a case argument is to be 
iterated and all its values added to the current case

   5. Static names are not needed - static expressions must refer solely to 
literals and non-local names

An issue with Point 4 is a syntactic nit that Eric Sumner pointed out. Since 
it involves iteration over x to populate the jump table rather than doing a 
containment test on x, using 'case in x' is misleading. It would be better 
written as 'case *x'.

Then:
   'case 1:'     ==> a switch value of 1 will jump to this case
   'case 1, 2:'  ==> a switch value of 1 or 2 will jump to this case
   'case *x'     ==> any switch value in x will jump to this case
   'case *x, *y' ==> any switch value in x or y will jump to this case

For the remaining points, I share Jim Jewett's concern that 'function 
definition time' is well defined for function scopes only - a better 
definition of the evaluation time is needed so that it works for other code as 
well. (Unlike Jim, I have no problems with restricting switch statements to 
hashable objects and building the entire jump table at once - if what you want 
is an arbitrary if-elif chain, then write one!)

I'd also like to avoid confusing the code execution order too much. People 
objected to the out-of-order evaluation in statement local namespaces - what's 
being proposed for static expressions is significantly worse.

So here's a fleshed out proposal for 'once expressions' that are evaluated the 
first time they are encountered and cached thereafter.

Once expressions
----------------
   An expression of the form 'once EXPR' is evaluated exactly once for a given 
scope. Precedence rules are as for yield expressions.
   Evaluation occurs the first time the expression is executed. On all 
subsequent executions, the expression will return the same result as was 
returned the first time.
   Referencing a function local variable name from a static expression is a 
syntax error. References to module globals, to closure variables and to names 
not bound in the module at all are fine.

Justifying evaluation at first execution time
---------------------------------------------
   With evaluation at first execution time, the semantics are essentially the 
same in all kinds of scope (module, function, class, exec). When the 
evaluation time is defined in terms of function definition time, it is very 
unclear what happens when there is no function definition involved.
   With the once-per-scope definition above, the potentially confusing cases 
that concerned Guido would have the behaviour he desired.

 >>> def foo(c):
...   print once c
...
SyntaxError: Cannot use local variable 'c' in once expression

   The rationale for disallowing function local variables in a once expression 
is that next time the function is executed, the local variables are expected 
to contain different values, so it is unlikely that any expression depending 
on them would give the same answer. Builtins, module globals and closure 
variables, on the other hand, will typically remain the same across 
invocations of a once expression. So the rationale for the syntactic 
restriction against using local variables is still there, even though the 
local variables may actually contain valid data at the time the once 
expression is executed. This syntactic restriction only applies to function 
locals so that a module level once expression is still useful.

 >>> def foo(c):
...   def bar():
...     print once c
...   return bar
...
 >>> b1 = foo(1)
 >>> b2 = foo(2)
 >>> b1()
1
 >>> b2()
2

   For this case, the important point is that execution of the once expression 
is once per scope, not once per program. Since running the function definition 
again creates a different function object, the once expression gets executed 
again the first time that function is called.

   An advantage of first time execution for functions is that it can be used 
to defer calculation of expensive default values to the first time they're needed.

 >>> def foo(c=None):
...   if c is None:
...     c = once calculate_expensive_default()
...   # etc
...

   With function definition time evaluation, the expensive default would 
always be calculated even if the specific application always provided an 
argument to the function and hence never actually needed the default.

   The one downside to this first time execution approach is that it means 
'once' is NOT a solution to the early-binding vs late-binding problem for 
closure variables. Forcing early binding would still require abuse of function 
defaults, or a compiler directive along the lines of the current 'global'. I 
consider that a reasonable price to pay for the more consistent expression 
semantics.

CPython implementation strategy
-------------------------------
   A once expression results in the compiler creating a cell object as a 
hidden variable in the current namespace. When the once expression is 
executed, it checks if the relevant cell object is empty. If it is, then the 
expression code is evaluated in the current namespace and the result stored in 
the cell object. If the cell object is not empty, then the stored value is 
used directly as the result of the expression.
   Code objects will acquire a new attribute, co_oncevars. This is a tuple 
containing the hidden variable names assigned by the compiler. It is similar 
to the existing co_freevars used to identify the names of closure variables.
   For any code executed using exec (including module level and class level 
code), the cell objects needed to satisfy co_oncevars are created in the 
relevant namespace before the code is executed, and deleted at the end of 
execution. That way we don't have junk attributes showing up on the module and 
class objects.
   For function code (including generator functions), the cells are stored in 
a new attribute (e.g. 'func_once') on the function object so that they persist 
across calls to the function. On each call to the function, the cell objects 
are inserted into the local namespace before the function code is executed. 
This is similar to the existing func_closure attribute (just as co_oncevars is 
similar to co_freevars).
   As an alternative to using new attributes, the hidden variable names could 
be appended to co_freevars, and the necessary cells appended to func_closure. 
The problem with that approach is that it may confuse existing introspection 
tools, whereas such tools would simply ignore the new attributes.

Definition of the switch statement using once
---------------------------------------------
(I deliberately omitted the trailing colon on the 'switch' to avoid the empty 
suite problem, similar to the fact that there is no colon at the end of a 
@-decorator line.)

   switch value
   case 1:
       CASE_EQUALS_1
   case *x:
       CASE_IN_X
   else:
       CASE_ELSE

would be semantically equivalent to

    _jump_dict = once dict((1, goto_CASE_EQUALS_1),
                          *((item, goto_CASE_IN_X) for item in x))
    try:
        _offset = _jump_dict[value]
    except KeyError
        _offset = goto_CASE_ELSE
    _goto offset

(Where _goto is a compiler internal operation to jump to a different point 
within the current code object)

This would entail updating the lnotab format to permit bytecode order that 
doesn't match source code order (since the case expressions would all be up 
with the evaluation of the jump dict).

Why 'once'?
-----------
   I picked once for the keyword because I consider the most important 
semantic point about the affected expression to be the fact that it is 
evaluated at most once per scope.
   static, const, final, etc are only contenders because of the other 
languages that use them as keywords. The words, in and of themselves, don't 
really have the right meaning.
   The once keyword is used by Eiffel to indicate a 0-argument function that 
is executed the first time it is called, and thereafter returns the result of 
that first call. That is pretty close to what I'm proposing it for here (only 
I'm proposing once-per-scope for expressions rather than Eiffel's 
once-per-program for functions).
   Additionally, a quick search for "once =" in the standard lib and its tests 
didn't find any occurrences (aside from a 'nonce =' in urllib2 :). Java's 
final (which is the only other option I really considered for a keyword), 
turned up 3 genuine hits (two in the compiler module, one in test_generators).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org