[New-bugs-announce] [issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library
report at bugs.python.org
Sun Apr 22 21:04:07 EDT 2018
New submission from Łukasz Langa <lukasz at langa.pl>:
Python includes a set of batteries that enable parsing of Python code. This
includes its own AST (provided in the standard library under the `ast` module),
as well as a pure Python tokenizer (provided in the standard library under
`tokenize` and `token`). It also provides an undocumented CST under lib2to3,
which contains its own outdated and patched copies of `tokenize` and `token`.
This situation causes the following issues for users of Python:
- the built-in AST does not preserve comments or whitespace;
- the built-in AST increasingly modifies the tree before presenting it to user
code (constant folding moved to the AST in Python 3.7);
- the built-in tokenize.py can only be used to parse Python 3.7+ code;
- the version in lib2to3 is partially customized and partially outdated,
leaving bits of new grammar not supported; new bits of grammar very often get
overlooked in lib2to3.
- lib2to3 is not documented.
So if users want to write tools that manipulate Python code, the standard
library doesn't provide them with great options.
I suggest the following plan:
1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py
(leaving the bits that allow for parsing of Python 3.6 and older files).
2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now
officially supports tokenizing Python 2.7 - 3.7 code.
3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen. Document it as the
built-in CST provided by Python for use in applications which require code
modification. Make it still officially support parsing of Python 2.7 - 3.7
All three changes are made in a backwards-compatible fashion, existing code
should NOT break. That being said, the parser under Lib/pgen might grow some
new behavior compared to the compatibility mode for lib2to3, I specifically
seek to improve handling of comments and error recovery.
components: Library (Lib)
nosy: benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, serhiy.storchaka
title: Provide a supported Concrete Syntax Tree implementation in the standard library
versions: Python 3.8
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce