[Python-Dev] Switch statement
Nick Coghlan
ncoghlan at gmail.com
Sat Jun 24 12:31:45 CEST 2006
The current train of thought seems to be to handle a switch statement as follows:
1. Define switch explicitly as a hash table lookup, with the hash table
built at function definition time
2. Allow expressions to be flagged as 'static' to request evaluation at
def-time
3. Have the expressions in a case clause be implicitly flagged as static
4. Allow 'case in' to be used to indicate that a case argument is to be
iterated and all its values added to the current case
5. Static names are not needed - static expressions must refer solely to
literals and non-local names
An issue with Point 4 is a syntactic nit that Eric Sumner pointed out. Since
it involves iteration over x to populate the jump table rather than doing a
containment test on x, using 'case in x' is misleading. It would be better
written as 'case *x'.
Then:
'case 1:' ==> a switch value of 1 will jump to this case
'case 1, 2:' ==> a switch value of 1 or 2 will jump to this case
'case *x' ==> any switch value in x will jump to this case
'case *x, *y' ==> any switch value in x or y will jump to this case
For the remaining points, I share Jim Jewett's concern that 'function
definition time' is well defined for function scopes only - a better
definition of the evaluation time is needed so that it works for other code as
well. (Unlike Jim, I have no problems with restricting switch statements to
hashable objects and building the entire jump table at once - if what you want
is an arbitrary if-elif chain, then write one!)
I'd also like to avoid confusing the code execution order too much. People
objected to the out-of-order evaluation in statement local namespaces - what's
being proposed for static expressions is significantly worse.
So here's a fleshed out proposal for 'once expressions' that are evaluated the
first time they are encountered and cached thereafter.
Once expressions
----------------
An expression of the form 'once EXPR' is evaluated exactly once for a given
scope. Precedence rules are as for yield expressions.
Evaluation occurs the first time the expression is executed. On all
subsequent executions, the expression will return the same result as was
returned the first time.
Referencing a function local variable name from a static expression is a
syntax error. References to module globals, to closure variables and to names
not bound in the module at all are fine.
Justifying evaluation at first execution time
---------------------------------------------
With evaluation at first execution time, the semantics are essentially the
same in all kinds of scope (module, function, class, exec). When the
evaluation time is defined in terms of function definition time, it is very
unclear what happens when there is no function definition involved.
With the once-per-scope definition above, the potentially confusing cases
that concerned Guido would have the behaviour he desired.
>>> def foo(c):
... print once c
...
SyntaxError: Cannot use local variable 'c' in once expression
The rationale for disallowing function local variables in a once expression
is that next time the function is executed, the local variables are expected
to contain different values, so it is unlikely that any expression depending
on them would give the same answer. Builtins, module globals and closure
variables, on the other hand, will typically remain the same across
invocations of a once expression. So the rationale for the syntactic
restriction against using local variables is still there, even though the
local variables may actually contain valid data at the time the once
expression is executed. This syntactic restriction only applies to function
locals so that a module level once expression is still useful.
>>> def foo(c):
... def bar():
... print once c
... return bar
...
>>> b1 = foo(1)
>>> b2 = foo(2)
>>> b1()
1
>>> b2()
2
For this case, the important point is that execution of the once expression
is once per scope, not once per program. Since running the function definition
again creates a different function object, the once expression gets executed
again the first time that function is called.
An advantage of first time execution for functions is that it can be used
to defer calculation of expensive default values to the first time they're needed.
>>> def foo(c=None):
... if c is None:
... c = once calculate_expensive_default()
... # etc
...
With function definition time evaluation, the expensive default would
always be calculated even if the specific application always provided an
argument to the function and hence never actually needed the default.
The one downside to this first time execution approach is that it means
'once' is NOT a solution to the early-binding vs late-binding problem for
closure variables. Forcing early binding would still require abuse of function
defaults, or a compiler directive along the lines of the current 'global'. I
consider that a reasonable price to pay for the more consistent expression
semantics.
CPython implementation strategy
-------------------------------
A once expression results in the compiler creating a cell object as a
hidden variable in the current namespace. When the once expression is
executed, it checks if the relevant cell object is empty. If it is, then the
expression code is evaluated in the current namespace and the result stored in
the cell object. If the cell object is not empty, then the stored value is
used directly as the result of the expression.
Code objects will acquire a new attribute, co_oncevars. This is a tuple
containing the hidden variable names assigned by the compiler. It is similar
to the existing co_freevars used to identify the names of closure variables.
For any code executed using exec (including module level and class level
code), the cell objects needed to satisfy co_oncevars are created in the
relevant namespace before the code is executed, and deleted at the end of
execution. That way we don't have junk attributes showing up on the module and
class objects.
For function code (including generator functions), the cells are stored in
a new attribute (e.g. 'func_once') on the function object so that they persist
across calls to the function. On each call to the function, the cell objects
are inserted into the local namespace before the function code is executed.
This is similar to the existing func_closure attribute (just as co_oncevars is
similar to co_freevars).
As an alternative to using new attributes, the hidden variable names could
be appended to co_freevars, and the necessary cells appended to func_closure.
The problem with that approach is that it may confuse existing introspection
tools, whereas such tools would simply ignore the new attributes.
Definition of the switch statement using once
---------------------------------------------
(I deliberately omitted the trailing colon on the 'switch' to avoid the empty
suite problem, similar to the fact that there is no colon at the end of a
@-decorator line.)
switch value
case 1:
CASE_EQUALS_1
case *x:
CASE_IN_X
else:
CASE_ELSE
would be semantically equivalent to
_jump_dict = once dict((1, goto_CASE_EQUALS_1),
*((item, goto_CASE_IN_X) for item in x))
try:
_offset = _jump_dict[value]
except KeyError
_offset = goto_CASE_ELSE
_goto offset
(Where _goto is a compiler internal operation to jump to a different point
within the current code object)
This would entail updating the lnotab format to permit bytecode order that
doesn't match source code order (since the case expressions would all be up
with the evaluation of the jump dict).
Why 'once'?
-----------
I picked once for the keyword because I consider the most important
semantic point about the affected expression to be the fact that it is
evaluated at most once per scope.
static, const, final, etc are only contenders because of the other
languages that use them as keywords. The words, in and of themselves, don't
really have the right meaning.
The once keyword is used by Eiffel to indicate a 0-argument function that
is executed the first time it is called, and thereafter returns the result of
that first call. That is pretty close to what I'm proposing it for here (only
I'm proposing once-per-scope for expressions rather than Eiffel's
once-per-program for functions).
Additionally, a quick search for "once =" in the standard lib and its tests
didn't find any occurrences (aside from a 'nonce =' in urllib2 :). Java's
final (which is the only other option I really considered for a keyword),
turned up 3 genuine hits (two in the compiler module, one in test_generators).
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-Dev
mailing list