[Python-ideas] a in x or in y

Thu Feb 13 23:42:52 CET 2014

On 14 February 2014 06:55, Nick Coghlan <ncoghlan at gmail.com> wrote:
> A suggestion like this, which would require defining two or three word
> tokens to meet the letter of the guideline while still breaking its spirit,
> simply isn't going to happen (especially when it doesn't provide a
> significant increase in expressiveness).

OK, I realised this proposal is actually closer to chained comparison
operators than I initially thought, and multiword tokens do indeed
make it technically feasible. However, that's still just a hack that
meet the letter of the guideline while still failing to abide by the
spirit of it.

(Disclaimer: all parsing descriptions below are intended to be
illustrative, and do not necessarily reflect how the CPython compiler
actually works. In particular, "compiler" is used as a shorthand to
refer to the arbitrary parts of toolchain.)

First, lets look at the existing multi-word tokens.

"is" vs "is not" is relatively simple: they're both binary operators.
In the following expressions:

    LHS is RHS
    LHS is not RHS

the extra "not" after "is", changes the comparison *operator*, but it
doesn't need to reach back and alter the interpretation of the LHS
expression itself.

"not" vs "not in" is a little different: that relies on the fact that
failing to follow a "not" in this position with "in" is a SyntaxError:

    LHS not RHS # Illegal
    LHS not in RHS

So again, the addition of the "in" doesn't affect the interpretation
of the LHS in any way.

Contrast that with the proposal in this thread:

    LHS or in RHS
    LHS and in RHS
    LHS or not in RHS
    LHS and not in RHS

In all of these cases, the "or op"/"and op" alters the way the *LHS*
is processed.

The closest parallel we have to that is chained comparison operators,
where the presence of "op2" alters the processing of "X op1 Y":

    X op1 Y op2 Z

All comparison operators are designed such that when the compiler is
about to resolve "X op1 Y", it looks ahead at the next token, and if
it sees another comparison operator, starts building an "and"
construct instead:

    X op1 Y and Y op2 Z

A *generalisation* of the current proposal, to work with arbitrary
comparison operators, clearly requires two token look ahead in order
to see "op2" after the logical operator:

    X op1 Y or op2 Z
    X op1 Y and op2 Z

To be reparsed as:

    X op1 Y or X op2 Z
    X op1 Y and X op2 Z

And allowing constructs like:

    if x == 1 or == 2 or in range(10, 20):
        # Do stuff

This is why making "or in" (etc) multiword tokens would still break
the spirit of the "only one token lookahead" guideline - the proposal
put forward is actually perfectly coherent for arbitrary comparison
operators, it just requires two token lookahead in order to see both
the logical operator *and* the comparison operator after it before
deciding how to resolve the processing of the LHS expression.

I'm actually more amenable to the generalised proposal than I was to
the original more limited idea, but anyone that wants to pursue it
further really needs to appreciate how deeply ingrained that "only one
token look ahead" guideline is for most of the core development team.
I don't actually believe even the more general proposal adds enough
expressiveness to convince Guido to change the lookahead restriction,
but I'm only -0 on that, whereas I was -1 on the
containment-tests-only special cased version.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia