[Compiler-sig] AST observations

Jeremy Hylton jeremy@zope.com
Thu, 18 Apr 2002 11:48:46 -0400


Starting with the last stuff first...

>>>>> "ECN" == Eric C Newton <ecn@metaslash.com> writes:

  ECN> Line numbers appear to be added to AST nodes in arbitrary ways.

Indeed.  It hasn't been obvious when to add line numbers.  The
original transformer added line numbers to statements, as far as I
could tell.  This didn't seem sufficient, because, e.g., the except
handler lines aren't individual statements.

  ECN> Some interesting projects which re-write existing code might
  ECN> like other token information, like comments.

Yes.  Refactoring tools would really like to have detailed position
information about each character.  I wouldn't find it acceptable if
such a tool reformatted my code.

  ECN> People have requested features for pychecker, like detecting
  ECN> unnecessary parens and semicolons, which is not possible, since
  ECN> these are not part of the AST.

That's by design.  An AST is a compiler intermediate representation
and parens and semicolons aren't part of the intermediate
representation.  If the analysis has to do with the syntax of the
language, I don't think the AST is the right place to check it.

How do you tell when a pair of parens is unnecessary, BTW?  I've often
used parens around the text part of an if statement so that emacs
formats it nicely when it takes up more than one line.  I find this a
completely acceptable use of "unnecessary" parens.

But regardless of whether the AST should be used for simple syntax
checking (maybe parens aren't just a syntactic issue), it would be
really helpful to decorate the AST with information about the tokens
that make up each node.

I don't know enough about the Python parser to know if it's possible
to get the parser to pass it along to the AST transformer in the
compiler.  I have the impression that things like comments get tossed
pretty early on.

In general, the AST doesn't want to have all the detailed token
information, because it doesn't care about them.  It would waste time
and space to record the information for the compiler.

So if a particular app needs the extra token info, it seems like we
could use the tokeinze module to collect the token info and associate
it with the AST.  I'm not sure how this would work in any detail, or I
would have tried it already :-).

Jeremy