[Python-Dev] Python Grammar Ambiguity

Michael Foord michael.foord at resolversystems.com
Mon Apr 24 15:28:50 CEST 2006


Hello all,

I'm working on a parser for part of the Python language (expressions but 
not statements basically). I'm using PLY to generate the parser and it's 
mostly done.

I've hit on what looks like a fundamental ambiguity in the Python 
grammar which is difficult to get round with PLY; and I'm wondering 
*why* the grammar is defined in this way. It's possible there is a 
reason that I've missed, which means I need to rethink my workaround.

List displays (list comprehensions) are defined as (from 
http://docs.python.org/ref/lists.html ) :


test      ::=      and_test ( "or" and_test )* | lambda_form
testlist     ::=     test ( "," test )* [ "," ]
list_display     ::=     "[" [listmaker] "]"
listmaker     ::=     expression ( list_for | ( "," expression )* [","] )
list_iter     ::=     list_for | list_if
list_for     ::=     "for" expression_list "in" testlist [list_iter]
list_if     ::=     "if" test [list_iter]

The problem is that list_for is defined as :

    "for" expression_list "in" testlist

This allows arbitrary expressions in the 'assignment' part of a list 
comprehension.

As a result, the following is valid syntax according to the grammar :

    [x for x + 1 in y]

Obviously it isn't valid ! This parses to an ast, but the syntax error 
is thrown when you compile the resulting ast.

The problem is that for the basic case of a list comprehension ( ``[x 
for x in y]``), ``x in y`` is a valid expression. That makes it 
extremely hard to disambiguate the grammar so that the ``in`` is treated 
correctly, and not part of an expression.

My question is, why are arbitrary expressions allowed here in the 
grammar ? As far as I can tell, only identifiers (nested in parentheses 
or brackets) are valid here. I've got round the problem by creating a 
new node 'identifier_list' and just having that return the expected 
syntax tree (actually an expression list). This gets round the ambiguity 
[#]_.

It worries me that there might be a valid expression allowed here that I 
haven't thought of. My current rules allow anything that looks like 
``(a, [b, c, (d, e)], f)`` - any  nested identifier list. Would anything 
else be allowed ?

If not, why not modify the grammar so that the compiler has less 
possible invalid syntax trees to work with ?

(Also the grammar definition of string conversion is wrong as it states 
that a trailing comma is valid, which isn't the case. As far as I can 
tell that is necessary to allow nesting string conversions.)

Fuzzyman
http://www.voidspace.org.uk/python/index.shtml


.. [#] If I could make precedence work in PLY I could also solve it I 
guess. However I can't. :-)


More information about the Python-Dev mailing list