With lib2to3 going away (https://bugs.python.org/issue40360), it seems to me that some of its functionality for handling "whitespace" can be fairly easily added to the ast module. (By "whitespace", I mean things like indent, dedent, comments, backslash; and also the ability to manipulate the encoded bytes in the original source.)
Off the top of my head, I know of the following projects that use lib2to3 or similar to access the "whitespace" in the parse tree and will need a new solution: yapf, black, mypy, pytype, kythe. (If they don't use lib2to3, they need to maintain a custom parser that changes with each release of Python, so my proposal would potentially help them.) Are there other projects that need access to the parse tree and "whitespace"?
I propose implementing an optional pass over the parse tree that records lib2to3's "prefix" with each leaf node. The interface would be something like:
def detect_encoding(source: bytes) -> str
def parse_with_whitespace(source: bytes, encoding: str, filename: str) -> ast.Module
def unparse_bytes(tree: ast.Module) -> bytes
def unparse_str(tree: ast.Module) -> str
# Various convenience functions/properties, similar to
pytree.next_sibling etc.
parse_with_whitespace() calls ast.parse(), then does a pass over the parse tree, adding to the leaf nodes: prefix: str # whitespace and comments preceding token in the input pieces: List # see below col_byte_offset: int # start byte offset within line src_byte_offset: int # start byte offset within source
The "pieces" field is intended to handle things like: x = 'abc' \ "def" y = (f'abc{x}' # comment "def")
Each "piece" would include:
The ast.Module class would also be extended with additional attributes that apply to the entire source, such as the encoding.
All of this is quite fiddly, but I already have some code for dealing with the conversions between byte and string offsets, so I don't anticipate a huge amount of work. (The design is also inherently a bit inefficient; but I don't want to get involved with the internals of compile().)
A related item: "Parser module in the stdlib": https://mail.python.org/archives/list/python-dev@python.org/thread/RHZ6JOEXJ...