[Python-Dev] PEP 3103: A Switch/Case Statement

Tue Jun 27 17:35:42 CEST 2006

On 6/27/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Guido van Rossum wrote:
> > I've written a new PEP, summarizing (my reaction to) the recent
> > discussion on adding a switch statement. While I have my preferences,
> > I'm trying to do various alternatives justice in the descriptions. The
> > PEP also introduces some standard terminology that may be helpful in
> > future discussions. I'm putting this in the Py3k series to gives us
> > extra time to decide; it's too important to rush it.
> >
> >   http://www.python.org/dev/peps/pep-3103/
>
> A generally nice summary, but as one of the advocates of Option 2 when it
> comes to freezing the jump table, I'd like to see it given some better press :)

Sure. Feel free to edit the PEP directly if you want.

> > Feedback (also about misrepresentation of alternatives I don't favor)
> > is most welcome, either to me directly or as a followup to this post.
>
> My preferred variant of Option 2 (calculation of the jump table on first use)
> disallows function locals in the switch cases just like Option 3. The
> rationale is that the locals can't be expected to remain the same across
> different invocations of the function, so caching an expression that depends
> on them is just as nonsensical for Option 2 as it is for Option 3 (and hence
> should trigger a Syntax Error either way).

OK, but the explanation of Option 2 becomes more cumbersome then:
instead of "first time executed" it now is "first time executed and
you cannot use any locals (but you can use locals if you're executing
globally, and you can use locals of outer functions) (oh, and whether
locals in a class are okay is anybody's guess)."

> Given that variant, my reasons for preferring Option 2 over Option 3 are:
>   - the semantics are the same at module, class and function level

No they're not. At the global level, this is okay bit at the function
level it's not:

  C = 1
  switch x:
  case C: print 42

Unless I misunderstand you and you want to disallow locals at the
global level too, in which case I see this okay at the function level
but not at the global level:

  switch x:
  case re.IGNORECASE: print 42

So I don't see how this is really true.

>   - the order of execution roughly matches the order of the source code

Only roughly though. One can still create obfuscated examples.

>   - it does not cause any surprises when switches are inside conditional logic
>
> As an example of the latter kind of surprise, consider this:
>
>    def surprise(x):
>       do_switch = False
>       if do_switch:
>           switch x:
>               case sys.stderr.write("Not reachable!\n"):
>                   pass
>
> Option 2 won't print anything, since the switch statement is never executed,
> so the jump table is never built. Option 3 (def-time calculation of the jump
> table), however, will print "Not reachable!" to stderr when the function is
> defined.

That's a pretty crooked example if you ask me. I think we all agree
that side effects of case expressions is one way how we can deduce the
compiler's behind-the-scenes tricks (even School Ib is okay with
this). So I don't accept this as proof that Option 2 is better.

> Now consider this small change, where the behaviour of Option 3 is not only
> surprising but outright undefined:
>
>    def surprise(x):
>       if 0:
>           switch x:
>               case sys.stderr.write("Not reachable!\n"):
>                   pass
>
> The optimiser is allowed to throw away the contents of an if 0: block. This
> makes no difference for Option 2 (since it never executed the case expression
> in the first place), but what happens under Option 3? Is "Not reachable!"
> written to stderr or not?

This is a good question. I think both behaviors are acceptable. Again,
the problem is with the side-effect-full case expression, not with
Option 3.

> When it comes to the question of "where do we store the result?" for the
> first-execution calculation of the jump table, my proposal is "a hidden cell
> in the current namespace".

Um, what do you mean by the current namespace? You can't mean the
locals of the function containing the switch. There aren't always
outer functions so I must conclude you mean the module globals. But
I've never seen those referred to as "the current namespace".

> The first time the switch statement is executed, the cell object is empty, so
> the jump table creation code is executed and the result stored in the cell. On
> subsequent executions of the switch statement, the jump table is retrieved
> directly from the cell.

OK.

> For functions, the cell objects for any switch tables would be created
> internally by the function object constructor based on the attributes of the
> code object. So the cells would be created anew each time the function
> definition is executed. These would be saved on the function object and
> inserted into the local namespace under the appropriate names before the code
> is executed (this is roughly the same thing that is done for closure
> variables). Deleting from the namespace afterwards isn't necessary, since the
> function local namespace gets thrown away anyway.

So do I understand that the switch gets re-initialized whenever a new
function object is created? That seems a violation of the "first time
executed" rule, or at least a modification ("first time executed per
defined function"). Or am I misunderstanding?

> For module and class code, code execution (i.e. the exec statement) is
> modified so that when a code object is flagged as requiring these hidden
> cells, they are created and inserted into the namespace before the code is
> executed and removed from the namespace when execution of the code is
> complete. Doing it this way prevents the hidden cells from leaking into the
> attribute namespace of the class or module without requiring implicit
> insertion of a try-finally into the generated bytecode. This means that switch
> statements will work correctly in all code executed via an exec statement.

But if I have a code object c containing a switch statement (not
inside a def) with a side effect in one of its cases, the side effect
is activated each time through the following loop, IIUC:

  d = {}
  for i in range(10):
    exec c in d

> The hidden variables would simply use the normal format for temp names
> assigned by the compiler: "_[%d]". Such temporary names are already used by
> the with statement and by list comprehensions.

Fine.

> To deal with the threading problem mentioned in the PEP, I believe it would
> indeed be necessary to use double-checked locking. Fortunately Python's
> execution order is well enough defined that this works as intended, and the
> optimiser won't screw it up the way it can in C++. Each of the hidden cell
> objects created by a function would have to contain a synchronisation lock
> that was acquired before the jump table was calculated (the module level cell
> objects created by exec wouldn't need the synchronisation lock). Pseudo-code
> for the cell initialisation process:
>
>    if the cell is empty:
>        acquire the cell's lock
>        try:
>            if the cell is still empty:
>                build the jump table and store it in the cell
>        finally:
>            release the cell's lock
>     retrieve the jump table from the cell
>
> No, it's not a coincidence that my proposal for 'once' expressions is simply a
> matter of taking the above semantics for evaluating the jump table and
> allowing them to be applied to an arbitrary expression. I actually had the
> idea for the jump table semantics before I thought of generalising it :)

I'm confused how you can first argue that tying things to the function
definition is one of the main drawbacks of Option 3, and then proceed
to tie Option 2 to the function definition as well. This sounds like
by far the most convoluted specification I have seen so far. I hope
I'm misunderstanding what you mean by namespace.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)