[Python-Dev] C AST to Python discussion

Thu Feb 16 07:14:17 CET 2006

On 2/15/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Greg Ewing wrote:
> > Brett Cannon wrote:
> >> One protects us from ending up with an unusable AST since
> >> the seralization can keep the original AST around and if the version
> >> passed back in from Python code is junk it can be tossed and the
> >> original version used.
> >
> > I don't understand why this is an issue. If Python code
> > produces junk and tries to use it as an AST, then it's
> > buggy and deserves what it gets. All the AST compiler
> > should be responsible for is to try not to crash the
> > interpreter under those conditions. But that's true
> > whatever method is used for passing ASTs from Python
> > to the compiler.
>
> I'd prefer the AST node be real Python objects. The arena approach seems to be
> working reasonably well, but I still don't see a good reason for using a
> specialised memory allocation scheme when it really isn't necessary and we
> have a perfectly good memory management system for PyObject's.
>

If the compiler was hacked on by more people I would agree with this. 
But few people do and so I am not too worried about using a simple,
custom memory system as long as its use is clearly written out for
those few who do decide to work on it (and I am willing to be in
charge of that, regardless of which solution we go with).  Obviously
it could be argued that more people don't because of its "special"
coding style, but then again the old compiler wasn't special and very
few people touched that beast.

> On the 'unusable AST' front, if AST transformation code creates illegal
> output, then the main thing is to raise an exception complaining about what's
> wrong with it. I believe that may need a change to the compiler whether the
> modified AST was serialised or not.
>

That's fine, but I wasn't sure where this exception would be raised. 
I guess it would come up during the import of a module if it was
automatically passing the AST through a list of processing functions. 
Some might view it as not as bad as a segfault of the interpreter, but
worse than just an ImportError.  As I said, I am fine with allowing
modification, but others have expressed reservations.

> In terms of reverting back to the untransformed AST if the transformation
> fails, then that option is up to the code doing the transformation. Instead of
> serialising all the time (even for cases where the AST is just being inspected
> instead of transformed), we can either let the AST objects support the
> copy/deepcopy protocol, or else provide a method to clone a tree before trying
> to transform it.
>

I view it as a one-time serialization and a one-time conversion back. 
So the compiler goes C -> Python objects.  That is then subsequently
passed into the first function registered to access the AST.  The AST
returned by that function is then immediately and directly passed to
the next function in the list.  This continues until the last function
in which that returned AST is then converted back to the C
representation, verified, and then sent on to the bytecode compiler.

> A unified representation means we only have one API to learn, that is
> accessible from both Python and C. It also eliminates any need to either
> implement features twice (once in Python and once in C) or else let the Python
> and C API's diverge to the point where what you can do with one differs from
> what you can do with the other.
>

I suspect that any marshalling from C to Python will have a matching
object design based on the AST node layout in the ASDL.  So that API
won't really be different from C to Python if we stick with the arena
solution.

And I also realized that marshalling might just go straight C to
Python objects and not an intermediary step as I had in my head. 
Don't know why I thought it might need it or if anyone picked up on
that being a possibility.

-Brett