The PEP gives a good exposition of the problem and proposed solution, thanks.
If I understand correctly, the proposal is that the PEG grammar should become the definitive grammar for Python at some point, probably for Python 3.10, so it may evolve without the LL(1) restrictions. I'd like to raise some points with respect to that, which perhaps the migration section could answer.
When definitive, the grammar would not then just be for CPython, and would also appear as user documentation of the language. Whether that change leaves Python with a more useful (readable) grammar seems an important test of the idea. I'm looking at https://github.com/we-like-parsers/cpython/blob/pegen/Grammar/python.gram , and assuming that is indicative of a future definitive grammar. That may be incorrect, as it has these issues in my view:
1. It is decorated with actions in C. If a decorated grammar is offered as definitive, one with Python actions (operations on the AST) is preferable, as implementation neutral, although still hostage to AST changes that are not language changes. Maybe one stripped of actions is best.
2. It's quite long, and not at first glance more readable than the LL(1) grammar. I had understood ugliness in the LL(1) grammar to result from skirting limitations that PEG eliminates. The PEG one is twice as long, but recognising about half of it is actions, let's just say that as a grammar it's no shorter.
3. There is some manual guidance by means of &-guards, only necessary (I think) as a speed-up or to force out meaningful syntax errors. That would be noise to the reader. (This goes away if the PEG parser generator generate guards from the first set at a simple "no backtracking" marker.)
4. In some places, expansive alternatives seem to be motivated by the difference between actions, for a start, wherever async pops up. Maybe it is also why the definition of lambda is so long. That could go away with different support code (e.g. is_async as an argument), but if improvements to the support change grammar rules, when the language has not changed, that's a danger sign too.
All that I think means that the "operational" grammar from which you build the parser is going to be quite unlike the one with which you communicate the language. At present ~/Grammar/Grammar both generates the parser (I thought) and appears as documentation. I take it to be the ideal that we use a single, human-readable definition. For example ANTLR 4 has worked hard to facilitate a grammar in which actions are implicit, and the generation of an AST from the parse tree/events can be elsewhere. (I'm not plugging ANTLR specifically as a solution.)
On 02/04/2020 19:10, Guido van Rossum wrote:
Since last fall's core sprint in London, Pablo Galindo Salgado, Lysandros Nikolaou and myself have been working on a new parser for CPython. We are now far enough along that we present a PEP we've written:
Hopefully the PEP speaks for itself. We are hoping for a speedy resolution so we can land the code we've written before 3.9 beta 1.
If people insist I can post a copy of the entire PEP here on the list, but since a lot of it is just background information on the old LL(1) and the new PEG parsing algorithms, I figure I'd spare everyone the need of reading through that. Below is a copy of the most relevant section from the PEP. I'd also like to point out the section on performance (which you can find through the above link) -- basically performance is on a par with that of the old parser.
============== Migration plan ==============
This section describes the migration plan when porting to the new PEG-based parser if this PEP is accepted. The migration will be executed in a series of steps that allow initially to fallback to the previous parser if needed:
1. Before Python 3.9 beta 1, include the new PEG-based parser machinery in CPython with a command-line flag and environment variable that allows switching between the new and the old parsers together with explicit APIs that allow invoking the new and the old parsers independently. At this step, all Python APIs like ``ast.parse`` and ``compile`` will use the parser set by the flags or the environment variable and the default parser will be the current parser.
2. After Python 3.9 Beta 1 the default parser will be the new parser.
3. Between Python 3.9 and Python 3.10, the old parser and related code (like the "parser" module) will be kept until a new Python release happens (Python 3.10). In the meanwhile and until the old parser is removed, **no new Python Grammar addition will be added that requires the peg parser**. This means that the grammar will be kept LL(1) until the old parser is removed.
4. In Python 3.10, remove the old parser, the command-line flag, the environment variable and the "parser" module and related code.
-- --Guido van Rossum (python.org/~guido http://python.org/~guido) /Pronouns: he/him //(why is my pronoun here?)/ http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/
Python-Dev mailing list --email@example.com To unsubscribe send an email firstname.lastname@example.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived athttps://email@example.com/message/HOZ2RI3F... Code of Conduct:http://python.org/psf/codeofconduct/