[Python-Dev] PEP 3103: A Switch/Case Statement
ncoghlan at gmail.com
Tue Jun 27 15:36:33 CEST 2006
Guido van Rossum wrote:
> I've written a new PEP, summarizing (my reaction to) the recent
> discussion on adding a switch statement. While I have my preferences,
> I'm trying to do various alternatives justice in the descriptions. The
> PEP also introduces some standard terminology that may be helpful in
> future discussions. I'm putting this in the Py3k series to gives us
> extra time to decide; it's too important to rush it.
A generally nice summary, but as one of the advocates of Option 2 when it
comes to freezing the jump table, I'd like to see it given some better press :)
> Feedback (also about misrepresentation of alternatives I don't favor)
> is most welcome, either to me directly or as a followup to this post.
My preferred variant of Option 2 (calculation of the jump table on first use)
disallows function locals in the switch cases just like Option 3. The
rationale is that the locals can't be expected to remain the same across
different invocations of the function, so caching an expression that depends
on them is just as nonsensical for Option 2 as it is for Option 3 (and hence
should trigger a Syntax Error either way).
Given that variant, my reasons for preferring Option 2 over Option 3 are:
- the semantics are the same at module, class and function level
- the order of execution roughly matches the order of the source code
- it does not cause any surprises when switches are inside conditional logic
As an example of the latter kind of surprise, consider this:
do_switch = False
case sys.stderr.write("Not reachable!\n"):
Option 2 won't print anything, since the switch statement is never executed,
so the jump table is never built. Option 3 (def-time calculation of the jump
table), however, will print "Not reachable!" to stderr when the function is
Now consider this small change, where the behaviour of Option 3 is not only
surprising but outright undefined:
case sys.stderr.write("Not reachable!\n"):
The optimiser is allowed to throw away the contents of an if 0: block. This
makes no difference for Option 2 (since it never executed the case expression
in the first place), but what happens under Option 3? Is "Not reachable!"
written to stderr or not?
When it comes to the question of "where do we store the result?" for the
first-execution calculation of the jump table, my proposal is "a hidden cell
in the current namespace".
The first time the switch statement is executed, the cell object is empty, so
the jump table creation code is executed and the result stored in the cell. On
subsequent executions of the switch statement, the jump table is retrieved
directly from the cell.
For functions, the cell objects for any switch tables would be created
internally by the function object constructor based on the attributes of the
code object. So the cells would be created anew each time the function
definition is executed. These would be saved on the function object and
inserted into the local namespace under the appropriate names before the code
is executed (this is roughly the same thing that is done for closure
variables). Deleting from the namespace afterwards isn't necessary, since the
function local namespace gets thrown away anyway.
For module and class code, code execution (i.e. the exec statement) is
modified so that when a code object is flagged as requiring these hidden
cells, they are created and inserted into the namespace before the code is
executed and removed from the namespace when execution of the code is
complete. Doing it this way prevents the hidden cells from leaking into the
attribute namespace of the class or module without requiring implicit
insertion of a try-finally into the generated bytecode. This means that switch
statements will work correctly in all code executed via an exec statement.
The hidden variables would simply use the normal format for temp names
assigned by the compiler: "_[%d]". Such temporary names are already used by
the with statement and by list comprehensions.
To deal with the threading problem mentioned in the PEP, I believe it would
indeed be necessary to use double-checked locking. Fortunately Python's
execution order is well enough defined that this works as intended, and the
optimiser won't screw it up the way it can in C++. Each of the hidden cell
objects created by a function would have to contain a synchronisation lock
that was acquired before the jump table was calculated (the module level cell
objects created by exec wouldn't need the synchronisation lock). Pseudo-code
for the cell initialisation process:
if the cell is empty:
acquire the cell's lock
if the cell is still empty:
build the jump table and store it in the cell
release the cell's lock
retrieve the jump table from the cell
No, it's not a coincidence that my proposal for 'once' expressions is simply a
matter of taking the above semantics for evaluating the jump table and
allowing them to be applied to an arbitrary expression. I actually had the
idea for the jump table semantics before I thought of generalising it :)
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev