[Python-Dev] PEP 3103: A Switch/Case Statement

Mon Jun 26 23:57:28 CEST 2006

On 6/26/06, Ka-Ping Yee <python-dev at zesty.ca> wrote:
> On Mon, 26 Jun 2006, Guido van Rossum wrote:
> > Can you please edit the PEP yourself to add this? That will be most efficient.
>
> I've done so, and tried to clarify the next line to match (see below).
>
> > With school I, if you want to
> > optimize using a hash table (as in PEP 275 Solution 1) you have to
> > catch and discard exceptions in hash(), and a bug in hash() can still
> > lead this optimization astray
>
> Right.  As written, the problem "a buggy hash might generate an
> incorrect match" is not specific to School I; it's a problem with
> any approach that is implemented by a hash lookup.  School II is
> necessarily implemented this way; School I might or might not be.
> So i think the part that says:
>
>     the hash function might have a bug or a side effect; if we
>     generate code that believes the hash, a buggy hash might
>     generate an incorrect match
>
> doesn't belong there, and i'd like your consent to remove it.

I'd rather keep it, but clarify that school II considers the outcome
of the hash() the official semantics, while school I + dict-based
optimization would create optimized code that doesn't match the
language specs. My point being that not following your own specs is a
much more severe sin than trusting a buggy hash(). If hash() is buggy,
school II's official spec means that the bug affects the outcome, and
that's that; but because school I specifies the semantics as based on
an if/elif chain, using a buggy hash() means not following the spec.
If we choose school I, a user may assume that a buggy hash() doesn't
affect the outcome because it's defined in terms of == tests only.

> On the other hand, this criticism:
>
>     if we generate code that catches errors in the hash to
>     fall back on an if/elif chain, we might hide genuine bugs
>
> is indeed specific to School I + hashing.

Right.

> > Right. School I appears just as keen as school II to use hashing to
> > optimize things, but isn't prepared to pay the price in semantics;
>
> Ok.  Then there's an inconsistency with the definition of School I:
>
>     School I wants to define the switch statement in term of
>     an equivalent if/elif chain
>
> To clear this up, i've edited the first line of the School II
> paragraph, which previously said:
>
>     School II sees nothing but trouble in that approach
>
> It seems clear that by "that approach" you meant "trying to achieve
> if/elif semantics while using hash optimization" rather than the
> more general definition of School I that was given.

Right. Thanks for the clarification; indeed, the only problem I have
with the "clean" school I approach (no hash-based optimization) is
that there's no optimization, and we end up once more having to tweak
the ordering of the cases based on our expectation of their frequency
(which may not match reality).

Most school I proponents (perhaps you're the only exception) have
claimed that optimization is desirable, but added that it would be
easy to add hash-based optimization. IMO it's not so easy in the light
of various failure modes of hash(). (A possible solution would be to
only use hashing if the expression's type is one of a small set of
trusted builtins, and not a subclass; we can trust int.__hash__,
str.__hash__ and a few others.)

> I believe
> there are a few voices here (and i count myself among them) that
> consider the semantics more important than the speed and are in
> School I but aren't treating hash optimization as the quintessence
> of 'switch', and we shouldn't leave them out.

This is an important distinction; thanks for pointing it out. Perhaps
we can introduce school Ia and Ib, where Ia is "clean but unoptimized"
and Ib is "if/elif with hash-based optimization desirable when
possible".

Another distinction between the two schools is that school Ib will
have a hard time optimizing switches based on named constants. I don't
believe that the "callback if variable affecting expression value is
ever assigned to" approach will work.

School II is definitely more pragmatist; I really don't see much wrong
with defining that it works a certain way, which is not exactly what
you would expect but has the same effect in *most* common cases, and
then explaining odd behavior in various corner cases out of that way,
as long as I don't care about those corner cases.

This is pretty much how I defend the semantics of default parameter
values -- it doesn't matter that they are computed at function
definition time instead of call time as long as you only use immutable
constants (which could be named constants as long as they're
immutable), which is the only use case I care about.

There are many other examples of odd Python semantics that favor a
relatively simple implementation and glorify that implementation as
the official semantics. I find this much preferable over weasel words
a la traditional language standards which give optimizers a lot of
leeway at the cost of difference in behavior between different
compilers or optimization levels.

For a real example from C++, read "Double Checked Locking is Broken":
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
(I first heard about this from Scott Meyers at the '06 ACCU conference
in Oxford; an earlier version of his talk is also online:
http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)