data structure for ASTs in Python-written parsers

eliben eliben at gmail.com
Sat Feb 14 07:47:37 CET 2009


Hello,

The Python interpreter uses ASDL (http://www.cs.princeton.edu/~danwang/
Papers/dsl97/dsl97.html) to describe the AST resulting from parsing.
In previous versions, there was another AST being used by the compiler
module - ast.txt and astgen.py in tools/compiler in the Python 2.5
source. However, as far as I understand this has been discontinued in
later Pythons.

I'm writing parsers in pure Python (with PLY), and wonder about the
best representation to use for the AST. So far I've been using a self-
cooked solution similar to ast.txt/astgen.py, but I was wondering
about ASDL.

Looking at it closely, it seems to me that ASDL is much more suitable
for statically-typed languages like C and Java than the duck-typed
Python. The assignment of types of nodes seems superfluous when "real"
Node classes can be used. Wherever dispatching is needed on the type
enum, in Python we can dispatch on isinstance(), and traversal using
reflection (automatic AST walkers) is possible in Python on the simple
representation, while ASDL just adds another complexity level.

Does this make sense? I realize Python itself uses ASDL because its
parser is coded in C. But what would you prefer as a representation
for parsers written *in* Python?

Thanks in advance



More information about the Python-list mailing list