[Python-Dev] Parser module in the stdlib

Pablo Galindo Salgado pablogsal at gmail.com
Thu May 16 17:12:29 EDT 2019


Hi everyone,

TLDR
=====

I propose to remove the current parser module and expose pgen2 as a
standard library module.

Some context
===========

The parser module has been "deprecated" (technically we recommend to prefer
the ast module instead) since Python2.5 but is still in the standard
library.
Is a 1222-line C module that needs to be kept in sync with the grammar and
the Python parser sometimes by hand. It has also been broken for several
years:
I recently fixed a bug that was introduced in Python 3.5 that caused the
parse module to not being able to parse if-else blocks (
https://bugs.python.org/issue36256).

The interface it provides is a very raw view of the CST. For instance:

>>> parser.sequence2st(parser.suite("def f(x,y,z): pass").totuple())

provides an object with methods compile, isexpr, issuite, tolist, totuple.
The last two produce containers with the numerical values of the grammar
elements (tokens, dfas...etc)

>>> parser.suite("def f(x,y,z): pass").tolist()
[257, [269, [295, [263, [1, 'def'], [1, 'f'], [264, [7, '('], [265, [266,
[1, 'x']], [12, ','], [266, [1, 'y']], [12, ','], [266, [1, 'z']]], [8,
')']], [11, ':'], [304, [270, [271, [277, [1, 'pass']]], [4, '']]]]]], [4,
''], [0, '']]

This is a very raw interface and is very tied to the particularities of
CPython without almost any abstraction.

On the other hand, there is a Python implementation of the Python parser
and parser generator in lib2to3 (pgen2). This module is not documented and
is usually considered an implementation
detail of lib2to3 but is extremely useful. Several 3rd party packages
(black, fissix...) are using it directly or have their own forks due to the
fact that it can get outdated with respect to the Python3
grammar as it was originally used only for Python2 to Python3 migration. It
has the ability to consume LL1 grammars in EBNF form and produces an LL1
parser for them (by creating parser tables
that the same module can consume). Many people use the module currently to
support or parse supersets of Python (like Python2/3 compatible grammars,
cython-like changes...etc).

Proposition
========

I propose to remove finally the parser module as it has been "deprecated"
for a long time, is almost clear that nobody uses it and has very limited
usability and replace it (maybe with a different name)
with pgen2 (maybe with a more generic interface that is detached to lib2to3
particularities). This will not only help a lot current libraries that are
using forks or similar solutions but also will help to keep
synchronized the shipped grammar (that is able to parse Python2 and Python3
code) with the current Python one (as now will be more justified to keep
them in sync).

What do people think about? Do you have any concerns? Do you think is a
good/bad idea?

Regards from sunny London,
Pablo Galindo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190516/16638ead/attachment.html>


More information about the Python-Dev mailing list