[Python-Dev] C AST to Python discussion

Brett Cannon brett at python.org
Wed Feb 15 09:34:35 CET 2006


As per Neal's prodding email, here is a thread to discuss where we
want to go with the C AST to Python stuff and what I think are the
core issues at the moment.

First issue is the ast-objects branch.  Work is being done on it, but
it still leaks some references (Neal or Martin can correct me if I am
wrong).  We really should choose either this branch or the current
solution before really diving into coding stuff for exposing the AST
so as to not waste too much time.  Basically the issues are that the
current solution will require using a serialization form to go from C
to Python and back again.  The PyObjects solution in the branch won't
need this.  One protects us from ending up with an unusable AST since
the seralization can keep the original AST around and if the version
passed back in from Python code is junk it can be tossed and the
original version used.  The PyObjects branch most likely won't have
this since the actual AST will most likely be passed to Python code. 
But there is performance issues with all of this seralization compared
to a simple Pyobject pointer into Pythonland.  Jeremy supports the
serialization option.  I am personally indifferent while leaning
towards the serialization.

Then there is the API.  First we need to decide if AST modification is
allowed or not.  It has been argued on my blog by someone (see
http://sayspy.blogspot.com/2006/02/possibilities-of-ast.html for the
entry on this whole topic which highly mirrors this email) that Guido
won't okay AST transformations since it can lead to control flow
changes behind the scenes.  I say that is fine as long as knowing that
AST transformations are occurring are sufficiently obvious.  I say
allow transformations.

Once that is settled, I see three places for possible access to the
AST.  One is the command line like -m.  Totally obvious to the user as
long as they are not just working off of the .pyc files.  Next is
something like sys.ast_transformations that is a list of functions
that are passed in the AST (and return a new version if modifications
are allowed).  This could allow chaining of AST transformations by
feeding the next function with another one.  Next is per-object AST
access.  This could get expensive since if we don't keep a copy of the
AST with the code objects (which we probably shouldn't since that is
wasted memory if the AST is not used a lot) we will need to read the
code a second time to get the AST regenerated.

I personally think we should choose an initial global access API to
the AST as a starting API.  I like the sys.ast_transformations idea
since it is simple and gives enough access that whether read-only or
read-write is allowed something like PyChecker can get the access it
needs.  It also allows for simple Python scripts that can install the
desired functions and then compile or check the passed-in files. 
Obviously write accesss would be needed for optimization stuff (such
as if the peepholer was rewritten in Python and used by default), but
we can also expose this later if we want.

In terms of 2.5, I think we really need to settle on the fate of the
ast-objects branch.  If we can get the very basic API for exposing the
AST to Python code in 2.5 that would be great, but I don't view that
as critical as choosing on the final AST implementation style since
wasting work on a version that will disappear would just plain suck. 
It would be great to resolve this before the PyCon sprints since a
good chunk of the AST-caring folk will be there for at least part of
the time.

-Brett


More information about the Python-Dev mailing list