[Python-Dev] Memory management in the AST parser & compiler

Brett Cannon bcannon at gmail.com
Mon Nov 28 22:59:04 CET 2005

On 11/28/05, Guido van Rossum <guido at python.org> wrote:
> [Guido]
> > > Then I don't understand why there was discussion of alloca() earlier
> > > on -- surely the lifetime of a node should not be limited by the stack
> > > frame that allocated it?
> [Jeremy]
> > Actually this is a pretty good limit, because all these data
> > structures are temporaries used by the compiler.  Once compilation has
> > finished, there's no need for the AST or the compiler state.
> Are you really saying that there is one function which is called only
> once (per compilation) which allocates *all* the AST nodes?

Nope, there isn't for everything.  It's just that some are temporary
to internal functions and thus can stand to be freed later (unless my
memory is really shot).  Otherwise it is piece-meal.  There is the
main data structure such as the compiler struct and the top-level node
for the AST, but otherwise everything (currently) is allocated as

> That's the
> only situation where I'd see alloca() working -- unless your alloca()
> doesn't allocate memory on the stack. I was somehow assuming that the
> tree would be built piecemeal by parser callbacks or some such
> mechanism. There's still a stack frame whose lifetime limits the AST
> lifetime, but it is not usually the current stackframe when a new node
> is allocated, so alloca() can't be used.
> I guess I don't understand the AST compiler code enough to participate
> in this discussion. Or perhaps we are agreeing violently?

I don't think your knowledge of the codebase precludes your
participation.  Actually, I think it makes it even more important
since if some scheme is devised that is not easily explained it is
really going to hinder who can help out with maintenance and
enhancements on the compiler.

> > > I'm not in principle against having an arena for this purpose, but I
> > > worry that this will make it really hard to provide a Python API for
> > > the AST, which has already been requested and whose feasibility
> > > (unless I'm mistaken) also was touted as an argument for switching to
> > > the AST compiler in the first place. I hope we'll never have to deal
> > > with an API like the parser module provides...
> >
> > My preference would be to have the ast shared by value.  We generate
> > code to serialize it to and from a byte stream and share that between
> > Python and C.  It is less efficient, but it is also very simple.
> So there would still be a Python-objects version of the AST but the
> compiler itself doesn't use it.

Yep.  The idea was be to return a PyString formatted ala the parser
module where it is just a bunch of nested items in a Scheme-like
format.  There would then be Python or C code that would generate a
Python object representation from that.  Then, when you were finished
tweaking the structure, you would write back out as a PyString and
then recreate the internal representation.  That makes it
pass-by-value since you pass the serialized PyString version across
the C-Python boundary.

> At least by-value makes sense to me -- if you're making tree
> transformations you don't want accidental sharing to cause unexpected
> side effects.

Yeah, that could be bad.  =)


More information about the Python-Dev mailing list