[New-bugs-announce] [issue33337] Provide a supported Concrete Syntax Tree implementation in the standard library

Łukasz Langa report at bugs.python.org
Sun Apr 22 21:04:07 EDT 2018

New submission from Łukasz Langa <lukasz at langa.pl>:

Python includes a set of batteries that enable parsing of Python code.  This
includes its own AST (provided in the standard library under the `ast` module),
as well as a pure Python tokenizer (provided in the standard library under
`tokenize` and `token`).  It also provides an undocumented CST under lib2to3,
which contains its own outdated and patched copies of `tokenize` and `token`.

This situation causes the following issues for users of Python:
- the built-in AST does not preserve comments or whitespace;
- the built-in AST increasingly modifies the tree before presenting it to user
  code (constant folding moved to the AST in Python 3.7);
- the built-in tokenize.py can only be used to parse Python 3.7+ code;
- the version in lib2to3 is partially customized and partially outdated,
  leaving bits of new grammar not supported; new bits of grammar very often get
  overlooked in lib2to3.
- lib2to3 is not documented.

So if users want to write tools that manipulate Python code, the standard
library doesn't provide them with great options.

I suggest the following plan:

1. Bring Lib/lib2to3/pgen2/tokenize.py to the same state as Lib/tokenize.py
   (leaving the bits that allow for parsing of Python 3.6 and older files).

2. Merge the two tokenizers in Python 3.8 so that Lib/tokenize.py now
   officially supports tokenizing Python 2.7 - 3.7 code.

3. Update Lib/lib2to3/pgen2 and move it under Lib/pgen.  Document it as the
   built-in CST provided by Python for use in applications which require code
   modification.  Make it still officially support parsing of Python 2.7 - 3.7

All three changes are made in a backwards-compatible fashion, existing code
should NOT break.  That being said, the parser under Lib/pgen might grow some
new behavior compared to the compatibility mode for lib2to3, I specifically
seek to improve handling of comments and error recovery.

components: Library (Lib)
messages: 315638
nosy: benjamin.peterson, gregory.p.smith, gvanrossum, lukasz.langa, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Provide a supported Concrete Syntax Tree implementation in the standard library
versions: Python 3.8

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list