On Thu, May 16, 2019 at 3:51 PM Pablo Galindo Salgado <pablogsal@gmail.com> wrote:

[Nathaniel Smith]

>Will the folks using forks be happy to switch to the stdlib version?
>For example I can imagine that if black wants to process 3.7 input
>code while running on 3.6, it might prefer a parser on PyPI even if
>he stdlib version were public, since the PyPI version can be updated
>independently of the host Python.

The tool can parse arbitrary grammars, the one that is packed into is just one of them.

I think it would be useful, among other things because the standard library
lacks currently a proper CST solution. The ast module is heavily leveraged for
things like formatters,

Actually, I think the `ast` module doesn't work very well for formatters, because it loses comments. (Retaining comments and all details of whitespace is the specific use case for which I created pgen2.)

static code analyzers...etc but CST can be very useful as
Łukasz describes here:

https://bugs.python.org/issue33337

I think is missing an important gap in the stdlib and the closest thing we have
(the current parser module) is not useful for any of that. Also, the core to generating
the hypothetical new package (with some new API over it may be) is already undocumented
as an implementation detail of lib2to3 (and some people are already using it directly).

I wonder if lib2to3 is actually something that would benefit from moving out of the stdlib. (Wasn't it on Amber's list?) As Łukasz points out in that issue, it is outdated. Maybe if it was out of the stdlib it would attract more contributors. Then again, I have recently started exploring the idea of a PEG parser for Python. Maybe a revamped version of the core of lib2to3 based on PEG instead of pgen would be interesting to some folks.

I do agree that the two versions of tokenize.py should be unified (and the result kept in the stdlib). However there are some issues here, because tokenize.py is closely tied to the names and numbers of the tokens, and the latter information is currently generated based on the contents of the Grammar file. This may get in the way of using it to tokenize old Python versions.

--Guido van Rossum (python.org/~guido)

Pronouns: he/him/his (why is my pronoun here?)