[Python-checkins] r47120 - peps/trunk/pep-3103.txt
guido.van.rossum
python-checkins at python.org
Tue Jun 27 07:06:59 CEST 2006
Author: guido.van.rossum
Date: Tue Jun 27 07:06:58 2006
New Revision: 47120
Modified:
peps/trunk/pep-3103.txt
Log:
Near-total rewrite of the section describing the different semantic
schools, inspired by discussion on python-dev with Ka-Ping. Hopefully
the different schools and their relative advantages and disadvantages
are represented more correctly now.
Modified: peps/trunk/pep-3103.txt
==============================================================================
--- peps/trunk/pep-3103.txt (original)
+++ peps/trunk/pep-3103.txt Tue Jun 27 07:06:58 2006
@@ -26,7 +26,7 @@
This PEP introduces canonical names for the many variants that have
been discussed for different aspects of the syntax and semantics, such
-as "alternative 2", "school II", "Option 3" and so on. Hopefully
+as "alternative 1", "school II", "option 3" and so on. Hopefully
these names will help the discussion.
@@ -223,8 +223,8 @@
The `*` notation is similar to the use of prefix `*` already in use for
variable-length parameter lists and for passing computed argument
-lists, and often proposed for value-unpacking (e.g. "a, b, *c = X" as
-an alternative to "(a, b), c = X[:2], X[2:]").
+lists, and often proposed for value-unpacking (e.g. ``a, b, *c = X`` as
+an alternative to ``(a, b), c = X[:2], X[2:]``).
Alternative D
-------------
@@ -270,8 +270,8 @@
based on dict lookup is chosen as the semantics, large ranges could be
inefficient (consider range(1, sys.maxint)).
-All in all my preferences are (in descending preference) B, A, D', C
-where D' is D without the third possibility.
+All in all my preferences are (from most to least favorite) B, A, D',
+C, where D' is D without the third possibility.
Semantics
@@ -283,66 +283,117 @@
If/Elif Chain vs. Dict-based Dispatch
-------------------------------------
-There are two main schools of thought about the switch statement's
-semantics. School I wants to define the switch statement in term of
-an equivalent if/elif chain. School II prefers to think of it as a
-dispatch on a precomputed dictionary.
-
-The difference is mainly important when either the switch expression
-or one of the case expressions is not hashable; school I wants this to
-be handled as it would be by an if/elif chain (i.e. hashability of the
-expressions involved doesn't matter) while school II is willing to say
-that the switch expression and all the case expressions must be
-hashable if a switch is to be used; otherwise the user should have
-written an if/elif chain.
-
-There's also a difference of opinion regarding the treatment of
-duplicate cases (i.e. two or more cases with the same match
-expression). School I wants to treat this the same is an if/elif
-chain would treat it (i.e. the first match wins and the code for the
-second match is silently unreachable); school II generally wants this
-to be an error at the time the switch is frozen.
-
-There's also a school III which states that the definition of a switch
-statement should be in terms of an equivalent if/elif chain, with the
-exception that all the expressions must be hashable.
-
-School I believes that the if/elif chain is the only reasonable,
-surprise-free way of defining switch semantics, and that optimizations
-as suggested by PEP 275's Solution 1 are sufficient to make most
-common uses fast. School I sees trouble in the approach of
-pre-freezing a dispatch dictionary because it places a new and unusual
-burden on programmers to understand exactly what kinds of case values
-are allowed to be frozen and when the case values will be frozen, or
-they might be surprised by the switch statement's behavior.
-
-School II sees trouble in trying to achieve semantics that match
-those of an if/elif chain while optimizing the switch statement into
-a hash lookup in a dispatch dictionary. In an if/elif chain, the
-test "x == y" might well be comparing two unhashable values
-(e.g. two lists); even "x == 1" could be comparing a user-defined
-class instance that is not hashable but happens to define equality to
-integers. Worse, the hash function might have a bug or a side effect;
-if we generate code that believes the hash, a buggy hash might
-generate an incorrect match, and if we generate code that catches
-errors in the hash to fall back on an if/elif chain, we might hide
-genuine bugs. In addition, school II sees little value in allowing
-cases involving unhashable values; after all if the user expects such
-values, they can just as easily write an if/elif chain. School II
-also doesn't believe that it's fair to allow dead code due to
-overlapping cases to occur unflagged, when the dict-based dispatch
-implementation makes it so easy to trap this.
-
-School III admits the problems with making hash() optional, but still
-believes that the true semantics should be defined by an if/elif chain
-even if the implementation should be allowed to use dict-based
-dispatch as an optimization. This means that duplicate cases must be
-resolved by always choosing the first case, making the second case
-undiagnosed dead code.
+There are several main schools of thought about the switch statement's
+semantics:
+
+- School I wants to define the switch statement in term of an
+ equivalent if/elif chain (possibly with some optimization thrown
+ in).
+
+- School II prefers to think of it as a dispatch on a precomputed
+ dict. There are different choices for when the precomputation
+ happens.
+
+- There's also school III, which agrees with School I that the
+ definition of a switch statement should be in terms of an equivalent
+ if/elif chain, but concedes to the optimization camp that all
+ expressions involved must be hashable.
+
+We need to further separate School I into School Ia and School Ib:
+
+- School Ia has a simple position: a switch statement is translated to
+ an equivalent if/elif chain, and that's that. It should not be
+ linked to optimization at all. That is also my main objection
+ against this school: without any hint of optimization, the switch
+ statement isn't attractive enough to warrant new syntax.
+
+- School Ib has a more complex position: it agrees with School II that
+ optimization is important, and is willing to concede the compiler
+ certain liberties to allow this. (For example, PEP 275 Solution 1.)
+ In particular, hash() of the switch and case expressions may or may
+ not be called (so it should be side-effect-free); and the case
+ expressions may not be evaluated each time as expected by the
+ if/elif chain behavior, so the case expressions should also be
+ side-effect free. My objection to this (elaborated below) is that
+ if either the hash() or the case expressions aren't
+ side-effect-free, optimized and unoptimized code may behave
+ differently.
+
+School II grew out of the realization that optimization of commonly
+found cases isn't so easy, and that it's better to face this head on.
+This will become clear below.
+
+The differences between School I (mostly School Ib) and School II are
+threefold:
+
+- When optimizing using a dispatch dict, if either the switch
+ expression or the case expressions are unhashable (in which case
+ hash() raises an exception), School Ib requires catching the hash()
+ failure and falling back to an if/elif chain. School II simply lets
+ the exception happen. The problem with catching an exception in
+ hash() as required by School Ib, is that this may hide a genuine
+ bug. A possible way out is to only use a dispatch dict if all case
+ expressions are ints, strings or other built-ins with known good
+ hash behavior, and to only attempt to hash the switch expression if
+ it is also one of those types. Type objects should probably also be
+ supported here. This is the (only) problem that School III
+ addresses.
+
+- When optimizing using a dispatch dict, if the hash() function of any
+ expression involved returns an incorrect value, under school Ib,
+ optimized code will not behave the same as unoptimized code. This
+ is a well-known problem with optimization-related bugs, and waste
+ lots of developer time. Under School II, in this situation
+ incorrect results are produced at least consistently, which should
+ make debugging a bit easier. The way out proposed for the previous
+ bullet would also help here.
+
+- School Ib doesn't have a good optimization strategy if the case
+ expressions are named constants. The compiler cannot know their
+ values for sure, and it cannot know whether they are truly constant.
+ As a way out, it has been proposed to re-evaluate the expression
+ corresponding to the case once the dict has identified which case
+ should be taken, to verify that the value of the expression didn't
+ change. But strictly speaking, all the case expressions occurring
+ before that case would also have to be checked, in order to preserve
+ the true if/elif chain semantics, thereby completely killing the
+ optimization. Another proposed solution is to have callbacks
+ notifying the dispatch dict of changes in the value of variables or
+ attributes involved in the case expressions. But this is not likely
+ implementable in the general case, and would require many namespaces
+ to bear the burden of supporting such callbacks, which currently
+ don't exist at all.
+
+- Finally, there's a difference of opinion regarding the treatment of
+ duplicate cases (i.e. two or more cases with match expressions that
+ evaluates to the same value). School I wants to treat this the same
+ is an if/elif chain would treat it (i.e. the first match wins and
+ the code for the second match is silently unreachable); School II
+ wants this to be an error at the time the dispatch dict is frozen
+ (so dead code doesn't go undiagnosed).
+
+School I sees trouble in School II's approach of pre-freezing a
+dispatch dict because it places a new and unusual burden on
+programmers to understand exactly what kinds of case values are
+allowed to be frozen and when the case values will be frozen, or they
+might be surprised by the switch statement's behavior.
+
+School II doesn't believe that School Ia's unoptimized switch is worth
+the effort, and it sees trouble in School Ib's proposal for
+optimization, which can cause optimized and unoptimized code to behave
+differently.
+
+In addition, school II sees little value in allowing cases involving
+unhashable values; after all if the user expects such values, they can
+just as easily write an if/elif chain. School II also doesn't believe
+that it's right to allow dead code due to overlapping cases to occur
+unflagged, when the dict-based dispatch implementation makes it so
+easy to trap this.
Personally, I'm in school II: I believe that the dict-based dispatch
is the one true implementation for switch statements and that we
-should face the limitiations and benefits up front.
+should face the limitiations up front, so that we can reap maximal
+benefits.
When to Freeze the Dispatch Dict
--------------------------------
More information about the Python-checkins
mailing list