[Python-ideas] Unpack of sequences

Nick Coghlan ncoghlan at gmail.com
Fri Aug 31 09:42:44 CEST 2012


On Fri, Aug 31, 2012 at 1:14 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 30/08/12 11:07, Nick Coghlan wrote:
>> The obvious form for such a statement is "LHS OP RHS", however
>> syntactic ambiguity in the evaluation of both the LHS and RHS
>> (relative to normal assigment) would likely prevent that.
>
>
> I think you need to explain that in further detail. Suppose we
> used (making something up here) "::=" as the operator. Then e.g.:
>
> a, b, c=DEFAULT, d=None :== 1, 2, c=3
>
> seems unambiguous to me

You're looking too far ahead - the (deliberately dumb) parser would
choke because:

a, b, c = DEFAULT, d = None

is a valid assignment statement. It's one that can't work (due to the
length mismatch in the tuple unpacking), but it's syntactically valid.

Thus, by the time it reached the "::=", the parser wouldn't be able to
backtrack and revise its opinion from "ordinary assignment statement"
to "parameter binding operation". Instead, it would bail out, whinging
about an unexpected token (in reality, you'd hit problems long before
that stage - the parser generator would have quit on you, complaining
about ambiguous syntax, probably with an entirely unhelpful error
message).

In some cases (such as augmented assignment) we can deal with that
kind of ambiguity by making the parser permissive, and catching syntax
errors at a later stage in the compilation pipeline (generally either
symbol analysis or bytecode generation - I can't think of any reason
error detection would ever be delayed until the peephole optimisation
step).

However, delayed validation would be a very poor approach in this
case, since there are valid parameter specifications that are also
completely valid assignment statements (as shown above). You'd have to
maintain a complex set of "maybe this, maybe that" constructs in order
to pull it off, which would be very, very, ugly (and make a mess of
the AST). Furthermore, if something is hard for the *computer* to
parse, odds are pretty good that humans are also going to struggle
with it (natural language notwithstanding - in that case, computers
are playing catchup with millions of years of evolutionary
development).

Fortunately, the entire ambiguity problem can go away *if* you can
find a suitable prefixed or delimited syntax. Once you manage that,
then the prefix or opening delimiter serves as a marker for both the
compiler and the human reader that something different is happening.
That's why I stole the |parameter-spec| notation directly from Ruby's
block syntax for my syntactic sketch - it's a delimited syntax that
doesn't conflict with other Python syntax ("|" can't start an
expression - it can only appear as part of an operator or an augmented
assignment statement). The choice of operator for the name binding
operation itself would then be fairly arbitrary, since the parser
would already know it was dealing with a parameter binding operation.
"|param-spec| ::= arglist", "|param-spec| <- arglist", "|param-spec|
def= arglist", "|param-spec| ()= arglist", "|param-spec| (=) arglist",
"|param-spec|(arglist)" "|param-spec| from arglist" would all be
viable options from a syntactic point of view (YMMV wildly from a
readability point of view, of course)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list