> Actually, I think the `ast` module doesn't work very well for formatters, because it loses comments. (Retaining comments and all details of whitespace is the specific use case for which I created pgen2.)

Some uses I have seen include using it to check that the code before and after the formatting has no functional changes (both have the same ast) or to augment the information obtained with other sources. But yeah, I agree that static code analyzers and linters are a much bigger target. 

>I wonder if lib2to3 is actually something that would benefit from moving out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in that issue, it is outdated. Maybe if it was out of the stdlib it would attract more contributors. Then again, I have recently started exploring the idea of a PEG parser for Python. Maybe a revamped version of the core of lib2to3 based on PEG instead of pgen would be interesting to some folks.

I was thinking more on the line of leveraging some parts lib2to3 having some CST-related solution similar to the ast module, not exposing the whole functionality of lib2to3. Basically, it would be a more high-level abstraction to substitute the current parser module. Technically you should be able to reconstruct some primitives that lib2to3 uses on top of the output that the parser module generates (modulo some extra information from the grammar), but the raw output that the parser module generates is not super useful by itself, especially when you consider the maintenance costs.

On the other side, as you mention here:

>I am interested in switching CPython's parsing strategy to something else (what exactly remains to be seen) and any new approach is unlikely to reuse the current CST technology. (OTOH I think it would be wise to keep the current AST.)

it is true that changing the parser can influence greatly the hypothetical CST module so it may complicate the conversion to a new parser solution if the API does not abstract enough (or it may be close to impractical depending on the new parser solution).

My original suggestion was based on the fact that the parser module is not super useful and it has a great maintenance cost, but the "realm" of what it solves (providing access to the parse trees) could be useful to some use cases so that is why I was talking about "parser" and lib2to3 in the same email.

Perhaps we can be more productive if we focus on just deprecating the "parser" module, but I thought it was an opportunity to solve two (related) problems at once.


On Mon, 20 May 2019 at 17:28, Guido van Rossum <guido@python.org> wrote:
On Thu, May 16, 2019 at 3:51 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:
[Nathaniel Smith]
>Will the folks using forks be happy to switch to the stdlib version? 
>For example I can imagine that if black wants to process 3.7 input
>code while running on 3.6, it might prefer a parser on PyPI even if
>he stdlib version were public, since the PyPI version can be updated
>independently of the host Python.

The tool can parse arbitrary grammars, the one that is packed into is just one of them.

I think it would be useful, among other things because the standard library
lacks currently a proper CST solution. The ast module is heavily leveraged for
things like formatters,

Actually, I think the `ast` module doesn't work very well for formatters, because it loses comments. (Retaining comments and all details of whitespace is the specific use case for which I created pgen2.)
 
static code analyzers...etc but CST can be very useful as
Łukasz describes here:


I think is missing an important gap in the stdlib and the closest thing we have
(the current parser module) is not useful for any of that. Also, the core to generating
the hypothetical new package (with some new API over it may be) is already undocumented
as an implementation detail of lib2to3 (and some people are already using it directly).

I wonder if lib2to3 is actually something that would benefit from moving out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in that issue, it is outdated. Maybe if it was out of the stdlib it would attract more contributors. Then again, I have recently started exploring the idea of a PEG parser for Python. Maybe a revamped version of the core of lib2to3 based on PEG instead of pgen would be interesting to some folks.

I do agree that the two versions of tokenize.py should be unified (and the result kept in the stdlib). However there are some issues here, because tokenize.py is closely tied to the names and numbers of the tokens, and the latter information is currently generated based on the contents of the Grammar file. This may get in the way of using it to tokenize old Python versions.

--
--Guido van Rossum (python.org/~guido)
Pronouns: he/him/his (why is my pronoun here?)