Le ven. 3 avr. 2020 à 02:58, Greg Ewing <greg.ewing@canterbury.ac.nz> a écrit :
> On 3/04/20 10:33 am, Victor Stinner wrote:
> > I also like the fact that PEG is deterministic, whereas
> > LL(1) parsers are not.
>
> Where do you get that LL(1) parsers are not deterministic?
> That's news to me!

On Thu, Apr 2, 2020 at 6:15 PM Victor Stinner <vstinner@python.org> wrote:
Sorry, I was referring to *ambiguous* grammar rules. Extract of the PEP:

"Unlike LL(1) parsers PEG-based parsers cannot be ambiguous: if a
string parses, it has exactly one valid parse tree. This means that a
PEG-based parser cannot suffer from the ambiguity problems described
in the previous section."

Maybe we need to rephrase this a bit. It's more that the LL(1) and PEG formalisms deal very different with ambiguous *grammars*. An example of an ambiguous grammar would be:

start: X | Y
X: expr
Y: expr
expr: NAME | NAME '+' NAME

There are probably better examples of ambiguous grammars (see https://en.wikipedia.org/wiki/Ambiguous_grammar) but I think this will do to explain the problem.

This is a fine context-free grammar (it accepts strings like "a" and "a+b") but the LL(1) formalism will reject it because it sees an overlap in FIRST sets between X and Y -- not surprising because they have the same RHS. Also, even a more powerful formalism would have to make a choice whether to choose X or Y, which may matter if the derivation is used to build a parse tree (like Python's pgen does).

OTOH a PEG parser generator will always take the X alternative -- it doesn't care that there's more than one derivation, since its '|' operator is not symmetrical: X|Y and Y|X are not the same, as they are in LL(1) and most other formalisms. (In fact, the common notation for PEG uses '/' to emphasize this, but it looks ugly to me so I changed it to '|'.)

That PEG (by definition) always uses the first matching alternative is actually a blessing as well as a curse. The downside is that PEG can't tell you when have a real ambiguity in your grammar. But the upside is that it works like a programmer would write a (recursive descent) parser. Thus it "solves" the problem of ambiguous grammars by choosing the first alternative. This allows more freedom in designing a grammar. For example, it would let a language designer solve the "dangling else" problem from the Wikipedia page, by writing the form including the "else" clause first . (Python doesn't have that problem due to the use of indentation, but it might appear in another disguise.)

I should probably refine this argument and include it in the PEP as one of the reasons to prefer PEG over LR or LALR (but I need to think more about that -- it was a very early choice).

--
--Guido van Rossum (python.org/~guido)