[Python-Dev] Parser module in the stdlib

Pablo Galindo Salgado pablogsal at gmail.com
Mon May 20 14:05:54 EDT 2019


> Actually, I think the `ast` module doesn't work very well for formatters,
because it loses comments. (Retaining comments and all details of
whitespace is the specific use case for which I created pgen2.)

Some uses I have seen include using it to check that the code before and
after the formatting has no functional changes (both have the same ast) or
to augment the information obtained with other sources. But yeah, I agree
that static code analyzers and linters are a much bigger target.

>I wonder if lib2to3 is actually something that would benefit from moving
out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in
that issue, it is outdated. Maybe if it was out of the stdlib it would
attract more contributors. Then again, I have recently started exploring
the idea of a PEG parser for Python. Maybe a revamped version of the core
of lib2to3 based on PEG instead of pgen would be interesting to some folks.

I was thinking more on the line of leveraging some parts lib2to3 having
some CST-related solution similar to the ast module, not exposing the whole
functionality of lib2to3. Basically, it would be a more high-level
abstraction to substitute the current parser module. Technically you should
be able to reconstruct some primitives that lib2to3 uses on top of the
output that the parser module generates (modulo some extra information from
the grammar), but the raw output that the parser module generates is not
super useful by itself, especially when you consider the maintenance costs.

On the other side, as you mention here:

>I am interested in switching CPython's parsing strategy to something else
(what exactly remains to be seen) and any new approach is unlikely to reuse
the current CST technology. (OTOH I think it would be wise to keep the
current AST.)

it is true that changing the parser can influence greatly the hypothetical
CST module so it may complicate the conversion to a new parser solution if
the API does not abstract enough (or it may be close to impractical
depending on the new parser solution).

My original suggestion was based on the fact that the parser module is not
super useful and it has a great maintenance cost, but the "realm" of what
it solves (providing access to the parse trees) could be useful to some use
cases so that is why I was talking about "parser" and lib2to3 in the same
email.

Perhaps we can be more productive if we focus on just deprecating the
"parser" module, but I thought it was an opportunity to solve two (related)
problems at once.


On Mon, 20 May 2019 at 17:28, Guido van Rossum <guido at python.org> wrote:

> On Thu, May 16, 2019 at 3:51 PM Pablo Galindo Salgado <pablogsal at gmail.com>
> wrote:
>
>> [Nathaniel Smith]
>>
> >Will the folks using forks be happy to switch to the stdlib version?
>> >For example I can imagine that if black wants to process 3.7 input
>> >code while running on 3.6, it might prefer a parser on PyPI even if
>> >he stdlib version were public, since the PyPI version can be updated
>> >independently of the host Python.
>>
>> The tool can parse arbitrary grammars, the one that is packed into is
>> just one of them.
>>
>> I think it would be useful, among other things because the standard
>> library
>> lacks currently a proper CST solution. The ast module is heavily
>> leveraged for
>> things like formatters,
>>
>
> Actually, I think the `ast` module doesn't work very well for formatters,
> because it loses comments. (Retaining comments and all details of
> whitespace is the specific use case for which I created pgen2.)
>
>
>> static code analyzers...etc but CST can be very useful as
>> Łukasz describes here:
>>
>> https://bugs.python.org/issue33337
>>
>> I think is missing an important gap in the stdlib and the closest thing
>> we have
>> (the current parser module) is not useful for any of that. Also, the core
>> to generating
>> the hypothetical new package (with some new API over it may be) is
>> already undocumented
>> as an implementation detail of lib2to3 (and some people are already using
>> it directly).
>>
>
> I wonder if lib2to3 is actually something that would benefit from moving
> out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in
> that issue, it is outdated. Maybe if it was out of the stdlib it would
> attract more contributors. Then again, I have recently started exploring
> the idea of a PEG parser for Python. Maybe a revamped version of the core
> of lib2to3 based on PEG instead of pgen would be interesting to some folks.
>
> I do agree that the two versions of tokenize.py should be unified (and the
> result kept in the stdlib). However there are some issues here, because
> tokenize.py is closely tied to the names and numbers of the tokens, and the
> latter information is currently generated based on the contents of the
> Grammar file. This may get in the way of using it to tokenize old Python
> versions.
>
> --
> --Guido van Rossum (python.org/~guido)
> *Pronouns: he/him/his **(why is my pronoun here?)*
> <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190520/55cf83a9/attachment.html>


More information about the Python-Dev mailing list