[Python-Dev] Python Grammar Ambiguity

Brett Cannon brett at python.org
Mon Apr 24 21:36:10 CEST 2006


On 4/24/06, Michael Foord <michael.foord at resolversystems.com> wrote:
> Hello all,
>
> I'm working on a parser for part of the Python language (expressions but
> not statements basically). I'm using PLY to generate the parser and it's
> mostly done.
>
> I've hit on what looks like a fundamental ambiguity in the Python
> grammar which is difficult to get round with PLY; and I'm wondering
> *why* the grammar is defined in this way. It's possible there is a
> reason that I've missed, which means I need to rethink my workaround.
>
> List displays (list comprehensions) are defined as (from
> http://docs.python.org/ref/lists.html ) :
>

Have you checked Grammar/Grammar to make sure it is the same?  The
grammar in the language ref is not always the most up-to-date version
of the grammar.  For instance, the grammar rule for 'test' is actually
``test: or_test ['if' or_test 'else' test] | lambdef``.

-Brett

>
> test      ::=      and_test ( "or" and_test )* | lambda_form
> testlist     ::=     test ( "," test )* [ "," ]
> list_display     ::=     "[" [listmaker] "]"
> listmaker     ::=     expression ( list_for | ( "," expression )* [","] )
> list_iter     ::=     list_for | list_if
> list_for     ::=     "for" expression_list "in" testlist [list_iter]
> list_if     ::=     "if" test [list_iter]
>
> The problem is that list_for is defined as :
>
>     "for" expression_list "in" testlist
>
> This allows arbitrary expressions in the 'assignment' part of a list
> comprehension.
>
> As a result, the following is valid syntax according to the grammar :
>
>     [x for x + 1 in y]
>
> Obviously it isn't valid ! This parses to an ast, but the syntax error
> is thrown when you compile the resulting ast.
>
> The problem is that for the basic case of a list comprehension ( ``[x
> for x in y]``), ``x in y`` is a valid expression. That makes it
> extremely hard to disambiguate the grammar so that the ``in`` is treated
> correctly, and not part of an expression.
>
> My question is, why are arbitrary expressions allowed here in the
> grammar ? As far as I can tell, only identifiers (nested in parentheses
> or brackets) are valid here. I've got round the problem by creating a
> new node 'identifier_list' and just having that return the expected
> syntax tree (actually an expression list). This gets round the ambiguity
> [#]_.
>
> It worries me that there might be a valid expression allowed here that I
> haven't thought of. My current rules allow anything that looks like
> ``(a, [b, c, (d, e)], f)`` - any  nested identifier list. Would anything
> else be allowed ?
>
> If not, why not modify the grammar so that the compiler has less
> possible invalid syntax trees to work with ?
>
> (Also the grammar definition of string conversion is wrong as it states
> that a trailing comma is valid, which isn't the case. As far as I can
> tell that is necessary to allow nesting string conversions.)
>
> Fuzzyman
> http://www.voidspace.org.uk/python/index.shtml
>
>
> .. [#] If I could make precedence work in PLY I could also solve it I
> guess. However I can't. :-)
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org
>


More information about the Python-Dev mailing list