Hi everyone, TLDR ===== I propose to remove the current parser module and expose pgen2 as a standard library module. Some context =========== The parser module has been "deprecated" (technically we recommend to prefer the ast module instead) since Python2.5 but is still in the standard library. Is a 1222-line C module that needs to be kept in sync with the grammar and the Python parser sometimes by hand. It has also been broken for several years: I recently fixed a bug that was introduced in Python 3.5 that caused the parse module to not being able to parse if-else blocks ( https://bugs.python.org/issue36256). The interface it provides is a very raw view of the CST. For instance:
parser.sequence2st(parser.suite("def f(x,y,z): pass").totuple())
provides an object with methods compile, isexpr, issuite, tolist, totuple. The last two produce containers with the numerical values of the grammar elements (tokens, dfas...etc)
parser.suite("def f(x,y,z): pass").tolist() [257, [269, [295, [263, [1, 'def'], [1, 'f'], [264, [7, '('], [265, [266, [1, 'x']], [12, ','], [266, [1, 'y']], [12, ','], [266, [1, 'z']]], [8, ')']], [11, ':'], [304, [270, [271, [277, [1, 'pass']]], [4, '']]]]]], [4, ''], [0, '']]
This is a very raw interface and is very tied to the particularities of CPython without almost any abstraction. On the other hand, there is a Python implementation of the Python parser and parser generator in lib2to3 (pgen2). This module is not documented and is usually considered an implementation detail of lib2to3 but is extremely useful. Several 3rd party packages (black, fissix...) are using it directly or have their own forks due to the fact that it can get outdated with respect to the Python3 grammar as it was originally used only for Python2 to Python3 migration. It has the ability to consume LL1 grammars in EBNF form and produces an LL1 parser for them (by creating parser tables that the same module can consume). Many people use the module currently to support or parse supersets of Python (like Python2/3 compatible grammars, cython-like changes...etc). Proposition ======== I propose to remove finally the parser module as it has been "deprecated" for a long time, is almost clear that nobody uses it and has very limited usability and replace it (maybe with a different name) with pgen2 (maybe with a more generic interface that is detached to lib2to3 particularities). This will not only help a lot current libraries that are using forks or similar solutions but also will help to keep synchronized the shipped grammar (that is able to parse Python2 and Python3 code) with the current Python one (as now will be more justified to keep them in sync). What do people think about? Do you have any concerns? Do you think is a good/bad idea? Regards from sunny London, Pablo Galindo