[Python-3000] Switch and static, redux

Sat Jul 8 05:34:19 CEST 2006

On 7/7/06, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > The
> > combination of the first and second suggests a functional programming
> > mindset which somehow makes the third less likely.
>
> You seem to be conflating "nested functions" and
> "functional programming", which doesn't make sense
> to me.

Not even a smidgen? Small nested functions (or lambdas) are pretty
common in a functional style. UI callbacks are more typically
implemented as methods in my experience. We just talked in another
thread about I/O callbacks a la JavaScript but I hope we have better
ways to do that too. Nested functions used strictly as a code
structuring device are not so great IMO; they tend to make the outer
function really large, and the possibility of accidentally sharing
variables in the outer scope worries me.

Code that *returns* a function is often also indicative of a
functional style (unless it's a decorator :-).

> > Your suggestion also makes it hard to find a good place to store the
> > dispatch dict between the time the outer function is defined and the
> > time the the inner function is defined.
>
> Not at all -- you store it in a cell in the environment
> of f(). In other words, you treat it like
>
>    _g_cases = {...} # build the case dict
>
>    def f():
>      ...
>      def g():
>        ...
>        # switch using _g_cases

(I knew that argument was going to be considered a puzzle to solve. :-)

I don't like storing it as a global; and that's probably incorrect
too, e.g. if multiple definitions of f are being made in a loop. The
only safe place I can think of is on the function object created by
'def f' but this still begs the question of where the definition of g
finds it.

> > Having a rule that prescribes optimization depending on
> > the scopes of variables involved in a set of expressions sounds like a
> > recipe for hard to debug bugs (since these rules are so subtle).
>
> I don't mean that optimisation would be forbidden, just
> that it wouldn't be guaranteed if you use local names.

The kind of optimization that's within reach for the CPython compiler
is pretty much limited to compile-time constant expressions, and I
don't expect that to change (it would require way too much static
analysis of multiple modules). So in practice your rule would imply
that. And that's what's causing the debugging problem for me.
Questions like "why did my code suddenly change from O(log N) to O(N)
performance" are really hard to answer if it's the compiler (not)
taking a shortcut in one case.

> I don't see that as being any worse than the situation
> now where, e.g. a loop written at the module level tends
> to be slower because it's using globals for its variables.

No, that's different, because there is a very obvious syntactic clue:
the presence of a 'def' around the for loop. In the case considered
here, we can have two nearly identical cases involving an outer and an
inner def, the latter containing a switch; yet one is fast and the
other is slow, and the clue is that the slow one uses a local variable
of the outer def in the switch.

> I don't think there's anything particularly subtle about
> this, either. If the cases are based on the values of
> local variables, it seems pretty obvious that they're
> going to be recomputed every time you enter the function.

Understood. Once you see that you've solved the debugging problem. The
subtlety that I fear is that, when reading the switch code casually
(as you tend to do when first debugging something :-), you'll miss the
fact that one of the cases references an outer local variable. And
remember, the hypothetical case we're debugging is "why did this
suddenly become slower" -- the switch isn't yet the focus of the
investigation.

> > The changes to the rules that you are proposing make me think that we
> > are making the rules too "clever", which will making it that much
> > harder to debug issues around this.
>
> As you've said yourself, the use cases for all this are
> ones where the case values are effectively *constant*.
> Given that, it doesn't matter a damn how early or late
> they're evaluated, except insofar as it affects execution
> speed. What I'm proposing is a rule that says, "We don't
> guarantee to make it fast, but we'll try, and the more global
> your case values are, the faster we're likely to be able to
> make it".

All I'm arguing is that that's a much more subtle rule to have to
remember when debugging the speed thing than "switch cases are frozen
when the closest outer def is executed".

> > The much simpler rule that I
> > propose is simpler to debug because you don't have to know the scope
> > of the variables involved to know when the switch is frozen,
>
> But you shouldn't *have* to know exactly when it's frozen
> if your case values are constant, as they should be.

Of course. But I'm talking about debugging the case where you
*accidentally* violated that rule.

I guess the problem (with this continued argument) is that I'm
constantly switching between two points of view, and need both to
decide on an optimal solution. One POV is that of the casual user of
the switch statement. They should stick to the "use only things that
are (conceptually) constants". The other POV is that of the expert
user trying to understand (accidentally) obfuscated code. I still like
there to be simple rules to rememver for the latter. The difference is
that the rules don't have to be obvious unless you're somewhat
familiar with the implementation (in this case, storing the dispatch
table on the nearest function object).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)