[Python-Dev] PEP 3103: A Switch/Case Statement

Tue Jun 27 15:36:33 CEST 2006

Guido van Rossum wrote:
> I've written a new PEP, summarizing (my reaction to) the recent
> discussion on adding a switch statement. While I have my preferences,
> I'm trying to do various alternatives justice in the descriptions. The
> PEP also introduces some standard terminology that may be helpful in
> future discussions. I'm putting this in the Py3k series to gives us
> extra time to decide; it's too important to rush it.
> 
>   http://www.python.org/dev/peps/pep-3103/

A generally nice summary, but as one of the advocates of Option 2 when it 
comes to freezing the jump table, I'd like to see it given some better press :)

> Feedback (also about misrepresentation of alternatives I don't favor)
> is most welcome, either to me directly or as a followup to this post.

My preferred variant of Option 2 (calculation of the jump table on first use) 
disallows function locals in the switch cases just like Option 3. The 
rationale is that the locals can't be expected to remain the same across 
different invocations of the function, so caching an expression that depends 
on them is just as nonsensical for Option 2 as it is for Option 3 (and hence 
should trigger a Syntax Error either way).

Given that variant, my reasons for preferring Option 2 over Option 3 are:
  - the semantics are the same at module, class and function level
  - the order of execution roughly matches the order of the source code
  - it does not cause any surprises when switches are inside conditional logic

As an example of the latter kind of surprise, consider this:

   def surprise(x):
      do_switch = False
      if do_switch:
          switch x:
              case sys.stderr.write("Not reachable!\n"):
                  pass

Option 2 won't print anything, since the switch statement is never executed, 
so the jump table is never built. Option 3 (def-time calculation of the jump 
table), however, will print "Not reachable!" to stderr when the function is 
defined.

Now consider this small change, where the behaviour of Option 3 is not only 
surprising but outright undefined:

   def surprise(x):
      if 0:
          switch x:
              case sys.stderr.write("Not reachable!\n"):
                  pass

The optimiser is allowed to throw away the contents of an if 0: block. This 
makes no difference for Option 2 (since it never executed the case expression 
in the first place), but what happens under Option 3? Is "Not reachable!" 
written to stderr or not?

When it comes to the question of "where do we store the result?" for the 
first-execution calculation of the jump table, my proposal is "a hidden cell 
in the current namespace".

The first time the switch statement is executed, the cell object is empty, so 
the jump table creation code is executed and the result stored in the cell. On 
subsequent executions of the switch statement, the jump table is retrieved 
directly from the cell.

For functions, the cell objects for any switch tables would be created 
internally by the function object constructor based on the attributes of the 
code object. So the cells would be created anew each time the function 
definition is executed. These would be saved on the function object and 
inserted into the local namespace under the appropriate names before the code 
is executed (this is roughly the same thing that is done for closure 
variables). Deleting from the namespace afterwards isn't necessary, since the 
function local namespace gets thrown away anyway.

For module and class code, code execution (i.e. the exec statement) is 
modified so that when a code object is flagged as requiring these hidden 
cells, they are created and inserted into the namespace before the code is 
executed and removed from the namespace when execution of the code is 
complete. Doing it this way prevents the hidden cells from leaking into the 
attribute namespace of the class or module without requiring implicit 
insertion of a try-finally into the generated bytecode. This means that switch 
statements will work correctly in all code executed via an exec statement.

The hidden variables would simply use the normal format for temp names 
assigned by the compiler: "_[%d]". Such temporary names are already used by 
the with statement and by list comprehensions.

To deal with the threading problem mentioned in the PEP, I believe it would 
indeed be necessary to use double-checked locking. Fortunately Python's 
execution order is well enough defined that this works as intended, and the 
optimiser won't screw it up the way it can in C++. Each of the hidden cell 
objects created by a function would have to contain a synchronisation lock 
that was acquired before the jump table was calculated (the module level cell 
objects created by exec wouldn't need the synchronisation lock). Pseudo-code 
for the cell initialisation process:

   if the cell is empty:
       acquire the cell's lock
       try:
           if the cell is still empty:
               build the jump table and store it in the cell
       finally:
           release the cell's lock
    retrieve the jump table from the cell

No, it's not a coincidence that my proposal for 'once' expressions is simply a 
matter of taking the above semantics for evaluating the jump table and 
allowing them to be applied to an arbitrary expression. I actually had the 
idea for the jump table semantics before I thought of generalising it :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org