[Python-Dev] Alternatives to switch?
Talin
talin at acm.org
Sun Jun 25 06:31:57 CEST 2006
At first I was pretty excited about the switch proposals, but having
read the various discussions I have to say that my enthusiasm has cooled
quite a bit. The set of proposals that are currently being put forward
have enough caveats and restrictions as to make the statement far less
useful than I had originally hoped.
In fact, I'd like to point out something that hasn't been brought up,
which is that in many cases having a closure rebind the switch cases
defeats the purpose of the thing. For example:
def outer():
def inner(x):
switch(x):
case 1: ...
case 2: ...
case 3: ...
return inner
If the switch cases are bound at the time that 'inner' is defined, it
means that the hash table will be rebuilt each time 'outer' is called.
But what if 'inner' is only intended to be used once? It means that the
performance advantage of switch is completely negated. On the other
hand, if 'inner' is intended to be used many times, then 'switch' is a
win. But the compiler has no idea which of these two cases is true.
I want to try and think out of the box here, and ask the question what
exactly we wanted a switch statement for, and if there are any other
ways to approach the problem.
Switch statements are used to efficiently dispatch based on a single
value to a large number of alternative code paths. The primary uses that
have been put forward are in interpreting parse trees (the 'sre' example
is a special case of this) and pickling. In fact, I would say that these
two use cases alone justify the need for some improvement to the
language, since both are popular design patterns, and both are somewhat
ill-served by the current limits of the language.
Parse trees and pickling can, in fact, be considered as two examples of
a broader category of "external interpretation of an object graph". By
"external interpretation" I mean that the inspection and transformation
of the data is not done via method calls on the individual objects, but
rather by code outside of the object graph that recognizes individual
object types or values and takes action based on that.
This type of architectural pattern manifests in nearly every sub-branch
of software engineering, and usually appears when you have a complex
graph of objects where it is inadvisable, for one reason or another, to
put the behavior in the objects themselves. There are several possible
reasons why this might be so. Perhaps the operations involve the
relationships between the objects rather than the objects themselves
(this is the parse tree case.) Or perhaps for reasons of modularity, it
is desired that the objects not have built-in knowledge of the type of
operation being performed - so that, for example, you can write several
different kinds of serializers for an object graph without having to
have the individual objects have special understanding of each different
serializer type. This is the pickle case.
I'd like to propose that we consider this class of problems (external
interpretation of an object graph) as the 'reference' use case for
dicussions of the merits of the switch statement, and that evaluation of
the merits of language changes be compared against this reference. Here
are my reasons for suggesting this:
-- The class is broad and encompasses a large set of practical,
real-world applications.
-- The class is not well-served by 'if-elif-else' dispatching styles.
-- There have been few, if any, use cases in support of a switch
statement that don't fall within this class.
So how does a switch statement help with this problem? Currently in
Python, there are a limited number of ways to do N-way dispatching:
-- if-elif-else chains
-- method overloading
-- dicts/arrays of function or method pointers
-- exotic and weird solutions such as using try/except as a
dispatching mechanism.
(I can't think of any others, but I am sure I missed something.)
We've already discussed the issues with if-elif-else chains, in
particular the fact that they have O(N) performance instead of O(1).
The next two options both have in common the fact that they require the
dispatch to go through a function call. This means that you are paying
for the (allegedly expensive) Python function dispatch overhead, plus
you no longer have access to any local variables which happened to be in
scope when the dispatch occured.
It seems to me that the desire for a switch statement is a desire to get
around those two limitations - in other words, if function calls were
cheap, and there was an easy way to do dynamic scoping so that called
functions could access their caller's variables, then there wouldn't be
nearly as much of a desire for a switch statement.
For example, one might do a pickling function along these lines:
dispatch_table = None
def marshall( data ):
type_code = None
object_data = None
def int_case():
type_code = TYPE_INT
object_data = str(data)
def str_case():
type_code = TYPE_STR
object_data = str(data)
# (and so on)
# Build the dispatch table once only
if dispatch_table is None:
dispatch_table = dict(
int, int_case,
str, str_case,
...
)
dispatch_table[ type( data ) ]()
However, you probably wouldn't want to write the code like this in
current-day Python -- even a fairly long if-elif-else chain would be
more efficient, and the code isn't as neatly expressive of what you are
trying to do. But the advantage is that the construction of the dispatch
table is explicit rather than implicit, which avoids all of the
arguments about when the dispatch should occur.
Another way to deal with the explicit construction of the switch table
is to contsruct it outside of the function body. So for example, if the
values to be switched on are meant to be evaluated at module load time,
then the user can define the dispatch table outside of any function. The
problem is, however, that the language requires any code that accesses
the local variables of a function to be textually embedded within that
function, and you can't build a dispatch table outside of a function
that refers to code sections within a function.
In the interest of brevity, I'm going to cut it off here before I ramble
on too much longer. I don't have an "answer", so much as I am trying to
raise the right questions.
-- Talin
More information about the Python-Dev
mailing list