[Python-Dev] ast-objects branch created

Mon Dec 5 17:36:14 CET 2005

On 12/5/05, James Y Knight <foom at fuhm.net> wrote:
>
> On Dec 5, 2005, at 8:46 AM, Jeremy Hylton wrote:
> > I can see that problem occurring with an all-or-nothing solution, but
> > not if you have the freedom to allocate from an arena or from some
> > other mechanism.  If there are multiple ways to allocate memory, there
> > is some increased programming burden (you have to remember how each
> > pointer was allocated) but you gain flexibility.  The ast-arena branch
> > allocates most memory from an arena, but allocates identifiers on the
> > regular heap as PyObjects.  It does keep a list of these PyObjects so
> > that it can DECREF them later.
>
> ISTM that having to remember which pointers are arena-allocated and
> which are normally-refcounted-allocated removes the major gain that
> an arena method is supposed to bring: resistance to mistakes. I'd
> find having a single way to allocate and track memory easier than
> multiple. Then you just have to follow the single set of best
> practices for memory management, and you're all set. (and with
> PyObjects, the same practices the rest of python uses, another win.)

It's a question of degree, right?  If you can find a small number of
rules that are easy to understand then you are still likely to avoid
mistakes.  For example, the current ast-arena branch uses two rules: 
All AST nodes are allocated from the arena.  All PyObjects attached to
an AST node (identifiers and constants) are associated with the arena,
i.e. they are DECREFed when it is freed.

> I'd also like to parrot the concern others have had that if the AST
> nodes are not made of PyObjects, then a mirror hierarchy of PyObject-
> ified AST nodes will have to be created, which seems like quite a
> wasteful duplication. If it is required that there be a collection of
> AST python objects (which I think it is), is there really any good
> reason to make the "real" AST objects not be the _only_ AST objects?
> I've not heard one.

The PyObject-ified AST nodes are only needed if user code requests an
AST from the compiler.  That is, if we add a new feature that exposes
AST, we would need AST objects represented in Python code.  I think
this feature would be great to add, but it doesn't seem like a primary
concern for the internal compiler implementation.  There is no need
for PyObject-ified AST objects in the internal compiler.  (I think
this fact is obvious, since the compiler exists but PyObject-ified AST
objects don't.)

The question, then, is the simplest way to provide Python code with
access to the AST objects.  I still think that a set of pure Python
classes to represent the AST nodes is a good approach.  You define a
simple serialization format for ASTs and the serialized AST can be
passed from the interpreter to user code and back.  The user code gets
a mutable tree of AST nodes that it can reserialize for compilation to
bytecode.  This strategy is exactly like the existing parser module.

One advantage of this approach is the AST objects in each language are
simpler to use.  The C AST nodes provide an easy API for C programmers
and the Python AST nodes provide an easy API for Python programmers. 
Put another way, since the AST code is all generated from a high level
description, the implementation doesn't matter at all.  What matters
is the API exposed in each programming language.  If the best API
happens to admit a shared implementation, that's great.  If it
doesn't, no loss.

Jeremy