[Python-ideas] a in x or in y

Nick Coghlan ncoghlan at gmail.com
Thu Feb 13 12:55:28 CET 2014


On 13 February 2014 20:57, Steven D'Aprano <steve at pearwood.info> wrote:
> - given the restrictions on the parser, is this even possible? and

I haven't worked through an actual candidate grammar (so I could
potentially be wrong) but I'm pretty sure it requires more lookahead
than Guido is willing to allow for the language definition.

At the point we hit:

    X in Y and

we're going to want to process "X in Y" as an expression, and then
start looking at compiling the RHS of the "and".

This proposal would mean that when the "in" shows up after the "and",
we need to backtrack, lift "X" out as its own expression, and then
reinsert it as the LHS of *multiple* containment tests.

It's worth noting that this deliberate "no more than one token
lookahead" design constraint in the language syntax isn't just for the
benefit of compiler writers: it's there for the benefit of human
readers as well.

In linguistics, there's a concept called "garden path" sentences -
these are sentences where the first part is a grammatically coherent
sentence, but by *adding more words to the end*, you change the
meaning of words that appeared earlier. This is a jarring experience
for readers. This is one of the classic examples:

    The horse raced past the barn fell.

That's a grammatical English sentence. If you're yelling at your
computer telling me that there's no way something that awkward can be
grammatically correct, you're not alone in feeling that way, but the
awkwardness isn't due to bad grammar. The reason it feels bad, is that
the first six words form a sentence in their own right:

    The horse raced past the barn.

This is the sentence our brains typically start constructing as we
read the seven word version, but then we get to the extra word "fell",
and the original grammar structure falls apart - the previous sentence
was already complete, and has no room for the extra word. So our brain
has to back track and come up with this alternate parsing:

    The horse [that was] raced past the barn fell.

The extra word at the end reaches back to change the entire structure
of the sentence, and effectively changing the meaning of "raced" as
well.

The "only one token lookahead" rule in Python's syntax design helps to
make it more difficult to write "garden path expressions" where
information presented late in the expression forces you to go back and
reevaluate information presented earlier in the expression. (You can
still write them - there'll just be some marker earlier on in the
expression to suggest that trickery may be afoot).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-ideas mailing list