[Python-Dev] Memory management in the AST parser & compiler

Neal Norwitz nnorwitz at gmail.com
Tue Nov 29 08:24:25 CET 2005

On 11/28/05, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Neal Norwitz wrote:
> > Hope this helps explain a bit.  Please speak up with how this can be
> > improved.  Gotta run.
> I would rewrite it as

[code snipped]

For those watching, Greg's and Martin's version were almost the same. 
However, Greg's version left in the memory leak, while Martin fixed it
by letting the result fall through.  Martin added some helpful rules
about dealing with the memory.  Martin also gets bonus points for
talking about developing a checker. :-)

In both cases, their modified code is similar to the existing AST
code, but all deallocation is done with Py_[X]DECREFs rather than a
type specific deallocator.  Definitely nicer than the current
situation.  It's also the same as the rest of the python code.

With arenas the code would presumably look something like this:

static stmt_ty
ast_for_funcdef(struct compiling *c, const node *n)
    /* funcdef: 'def' [decorators] NAME parameters ':' suite */
    identifier name;
    arguments_ty args;
    asdl_seq *body;
    asdl_seq *decorator_seq = NULL;
    int name_i;

    REQ(n, funcdef);

    if (NCH(n) == 6) { /* decorators are present */
        decorator_seq = ast_for_decorators(c, CHILD(n, 0));
        if (!decorator_seq)
            return NULL;
        name_i = 2;
    else {
        name_i = 1;

    name = NEW_IDENTIFIER(CHILD(n, name_i));
    if (!name)
        return NULL;
    if (!strcmp(STR(CHILD(n, name_i)), "None")) {
        ast_error(CHILD(n, name_i), "assignment to None");
        return NULL;
    args = ast_for_arguments(c, CHILD(n, name_i + 1));
    body = ast_for_suite(c, CHILD(n, name_i + 3));
    if (!args || !body)
        return NULL;

    return FunctionDef(name, args, body, decorator_seq, LINENO(n));

All the goto's become return NULLs.  After allocating a PyObject, it
would need to be registered (ie, the mythical Py_AST_Register(name)). 
This is easier than using all PyObjects in that when an error occurs,
there's nothing to think about, just return.  Only optional values
(like decorator_seq) need to be initialized.  It's harder in that one
must remember to register any PyObject so it can be Py_DECREFed at the
end.  Since the arena is allocated in big hunk(s), it would presumably
be faster than using PyObjects since there would be less memory
allocation (and fragmentation).  It should be possible to get rid of
some of the conditionals too (I joined body and args above).

Using all PyObjects has another benefit that may have been mentioned
elsewhere, ie that the rest of Python uses the same techniques for
handling deallocation.

I'm not really advocating any particular approach.  I *think* arenas
would be easiest, but it's not a clear winner.  I think Martin's note
about GCC using GC is interesting.  AFAIK GCC is a lot more complex
than the Python code, so I'm not sure it's 100% relevant.  OTOH, we
need to weigh that experience.


More information about the Python-Dev mailing list