The PEP gives a good exposition of the problem and proposed solution, thanks.
If I understand correctly, the proposal is that the PEG grammar should become the definitive grammar for Python at some point, probably for Python 3.10, so it may evolve without the LL(1) restrictions. I'd like to raise some points with respect to that, which perhaps the migration section could answer.
When definitive, the grammar would not then just be for CPython,
and would also appear as user documentation of the language.
Whether that change leaves Python with a more useful (readable)
grammar seems an important test of the idea. I'm looking at
https://github.com/we-like-parsers/cpython/blob/pegen/Grammar/python.gram
, and assuming that is indicative of a future definitive grammar.
That may be incorrect, as it has these issues in my view:
1. It is decorated with actions in C. If a decorated grammar is
offered as definitive, one with Python actions (operations on the
AST) is preferable, as implementation neutral, although still
hostage to AST changes that are not language changes. Maybe one
stripped of actions is best.
2. It's quite long, and not at first glance more readable than
the LL(1) grammar. I had understood ugliness in the LL(1) grammar
to result from skirting limitations that PEG eliminates. The PEG
one is twice as long, but recognising about half of it is actions,
let's just say that as a grammar it's no shorter.
3. There is some manual guidance by means of &-guards, only
necessary (I think) as a speed-up or to force out meaningful
syntax errors. That would be noise to the reader. (This goes away
if the PEG parser generator generate guards from the first set at
a simple "no backtracking" marker.)
4. In some places, expansive alternatives seem to be motivated by the difference between actions, for a start, wherever async pops up. Maybe it is also why the definition of lambda is so long. That could go away with different support code (e.g. is_async as an argument), but if improvements to the support change grammar rules, when the language has not changed, that's a danger sign too.
All that I think means that the "operational" grammar from which
you build the parser is going to be quite unlike the one with
which you communicate the language. At present ~/Grammar/Grammar
both generates the parser (I thought) and appears as
documentation. I take it to be the ideal that we use a single,
human-readable definition. For example ANTLR 4 has worked hard to
facilitate a grammar in which actions are implicit, and the
generation of an AST from the parse tree/events can be elsewhere.
(I'm not plugging ANTLR specifically as a solution.)
Jeff Allen
Since last fall's core sprint in London, Pablo Galindo Salgado, Lysandros Nikolaou and myself have been working on a new parser for CPython. We are now far enough along that we present a PEP we've written:
Hopefully the PEP speaks for itself. We are hoping for a speedy resolution so we can land the code we've written before 3.9 beta 1.
If people insist I can post a copy of the entire PEP here on the list, but since a lot of it is just background information on the old LL(1) and the new PEG parsing algorithms, I figure I'd spare everyone the need of reading through that. Below is a copy of the most relevant section from the PEP. I'd also like to point out the section on performance (which you can find through the above link) -- basically performance is on a par with that of the old parser.
==============
Migration plan
==============
This section describes the migration plan when porting to the new PEG-based parser
if this PEP is accepted. The migration will be executed in a series of steps that allow
initially to fallback to the previous parser if needed:
1. Before Python 3.9 beta 1, include the new PEG-based parser machinery in CPython
with a command-line flag and environment variable that allows switching between
the new and the old parsers together with explicit APIs that allow invoking the
new and the old parsers independently. At this step, all Python APIs like ``ast.parse``
and ``compile`` will use the parser set by the flags or the environment variable and
the default parser will be the current parser.
2. After Python 3.9 Beta 1 the default parser will be the new parser.
3. Between Python 3.9 and Python 3.10, the old parser and related code (like the
"parser" module) will be kept until a new Python release happens (Python 3.10). In
the meanwhile and until the old parser is removed, **no new Python Grammar
addition will be added that requires the peg parser**. This means that the grammar
will be kept LL(1) until the old parser is removed.
4. In Python 3.10, remove the old parser, the command-line flag, the environment
variable and the "parser" module and related code.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/HOZ2RI3FXUEMAT4XAX4UHFN4PKG5J5GR/ Code of Conduct: http://python.org/psf/codeofconduct/