[Python-Dev] Parser module in the stdlib

Guido van Rossum guido at python.org
Mon May 20 12:28:24 EDT 2019

On Thu, May 16, 2019 at 3:51 PM Pablo Galindo Salgado <pablogsal at gmail.com>

> [Nathaniel Smith]
>Will the folks using forks be happy to switch to the stdlib version?
> >For example I can imagine that if black wants to process 3.7 input
> >code while running on 3.6, it might prefer a parser on PyPI even if
> >he stdlib version were public, since the PyPI version can be updated
> >independently of the host Python.
> The tool can parse arbitrary grammars, the one that is packed into is just
> one of them.
> I think it would be useful, among other things because the standard library
> lacks currently a proper CST solution. The ast module is heavily leveraged
> for
> things like formatters,

Actually, I think the `ast` module doesn't work very well for formatters,
because it loses comments. (Retaining comments and all details of
whitespace is the specific use case for which I created pgen2.)

> static code analyzers...etc but CST can be very useful as
> Łukasz describes here:
> https://bugs.python.org/issue33337
> I think is missing an important gap in the stdlib and the closest thing we
> have
> (the current parser module) is not useful for any of that. Also, the core
> to generating
> the hypothetical new package (with some new API over it may be) is already
> undocumented
> as an implementation detail of lib2to3 (and some people are already using
> it directly).

I wonder if lib2to3 is actually something that would benefit from moving
out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in
that issue, it is outdated. Maybe if it was out of the stdlib it would
attract more contributors. Then again, I have recently started exploring
the idea of a PEG parser for Python. Maybe a revamped version of the core
of lib2to3 based on PEG instead of pgen would be interesting to some folks.

I do agree that the two versions of tokenize.py should be unified (and the
result kept in the stdlib). However there are some issues here, because
tokenize.py is closely tied to the names and numbers of the tokens, and the
latter information is currently generated based on the contents of the
Grammar file. This may get in the way of using it to tokenize old Python

--Guido van Rossum (python.org/~guido)
*Pronouns: he/him/his **(why is my pronoun here?)*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190520/f74ec2b3/attachment.html>

More information about the Python-Dev mailing list