ANN: leoAst.py creates two-way links between tokens and ast nodes

*Overview* leoAst.py <https://github.com/leo-editor/leo-editor/blob/fstrings/leo/core/leoAst.py> unifies python's token-oriented and ast-oriented worlds. leoAst.py defines classes that create two-way links between *tokens* created by python's tokenize module <https://docs.python.org/3/library/tokenize.html> and *parse tree nodes* created by python's ast module <https://docs.python.org/3/library/ast.html>: The *Token Order Generator* (TOG) class *quickly* creates the following links: - An *ordered* *children* array from each ast node to its children. Order matters! - A *parent* link from each ast.node to its parent. - Two-way links between tokens in the *token list*, a list of Token objects, and the ast nodes in the parse tree: - For each token, *token.node* contains the ast.node "responsible" for the token. - For each ast node, *node.first_i* and *node.last_i* are indices into the token list. These indices give the range of tokens that can be said to be "generated" by the ast node. Once the TOG class has inserted parent/child links, the *Token Order Traverser* (TOT) class traverses trees annotated with parent/child links extremely quickly. *Contents of leoAst.py* leoAst.py is completely independent of Leo <http://leoeditor.com/>. leoAst.py contains: - The TOG and TOT classes. - An *AstDumper* class that shows these links in various formats. - Unit tests that completely cover the TOG and TOT classes. - All necessary support code - An entirely new implementation of *fstringify* <https://pypi.org/project/fstringify/>. - *Orange*, an entirely new implementation of *black* <https://pypi.org/project/black/>. *Usage* usage: leoAst.py (--help | --pytest | --unittest) OR leoAst.py (--fstringify | --fstringify-diff | --orange | --orange-diff) PATHS positional arguments: PATHS directory or list of files optional arguments: -h, --help show this help message and exit --fstringify leonine fstringify --fstringify-diff show fstringify diff --orange leonine Black --orange-diff show orange diff --pytest run pytest --unittest run unittest.main() *Applicability and importance* Many python developers will find asttokens <https://github.com/gristlabs/asttokens> meets all their needs. asttokens is well documented and easy to use. Nevertheless, two-way links are significant additions to python's tokenize <https://docs.python.org/3/library/tokenize.html> and ast <https://docs.python.org/3/library/ast.html> modules: - Links from tokens to nodes are assigned to the nearest possible ast node, not the nearest statement, as in asttokens. Links can easily be reassigned, if desired. - The TOG and TOT classes are intended to be the foundation of tools such as *fstringify* <https://pypi.org/project/fstringify/> and *black* <https://pypi.org/project/black/>. - The TOG class solves real problems, such as: how-to-get-source-corresponding-to-a-python-ast-node <https://stackoverflow.com/questions/16748029/>. *Figures of merit* *Simplicity*: The code consists primarily of a set of generators, one for every kind of ast node. *Speed*: The TOG creates two-way links between tokens and ast nodes in roughly the time taken by python's tokenize.tokenize and ast.parse library methods. This is substantially faster than the asttokens <https://pypi.org/project/asttokens/>, fstringify <https://pypi.org/project/fstringify/>, and black <https://pypi.org/project/black/> tools. The TOT class traverses trees annotated with parent/child links even more quickly. *Memory*: The TOG class makes no significant demands on python's resources. Generators add nothing to python's call stack. TOG.node_stack is the only variable-length data. This stack resides in python's heap, so its length is unimportant. In the worst case, it might contain a few thousand entries. The TOT class uses no variable-length data at all. *Links* - Theory of operation <http://leoeditor.com/appendices.html#tokenorder-classes-theory-of-operation> . - Project history <https://github.com/leo-editor/leo-editor/issues/1440#issuecomment-574145510> . - A stack overflow discussion that mentions Leo's deprecated TokenSync class <https://stackoverflow.com/questions/7456933/>, an earlier attempt at plugging the holes in python's api. Edward ------------------------------------------------------------------------------------------ Edward K. Ream: edreamleo@gmail.com Leo: http://leoeditor.com/ ------------------------------------------------------------------------------------------
participants (1)
-
Edward K. Ream