[Python-Dev] Memory management in the AST parser & compiler

Nick Coghlan ncoghlan at gmail.com
Tue Nov 29 13:59:52 CET 2005


Neal Norwitz wrote:
> On 11/28/05, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Neal Norwitz wrote:
>>> Hope this helps explain a bit.  Please speak up with how this can be
>>> improved.  Gotta run.
>> I would rewrite it as
> 
> [code snipped]
> 
> For those watching, Greg's and Martin's version were almost the same. 
> However, Greg's version left in the memory leak, while Martin fixed it
> by letting the result fall through.  Martin added some helpful rules
> about dealing with the memory.  Martin also gets bonus points for
> talking about developing a checker. :-)
> 
> In both cases, their modified code is similar to the existing AST
> code, but all deallocation is done with Py_[X]DECREFs rather than a
> type specific deallocator.  Definitely nicer than the current
> situation.  It's also the same as the rest of the python code.

When working on the CST->AST parser, there were only a few things I found to 
be seriously painful about the memory management:

   1. Remembering which free_* variant to call for AST nodes
   2. Remembering which asdl_seq_*_free variant to call for ASDL sequences (it 
was worse when the variant I wanted didn't exist, since this was done with 
functions rather than preprocessor macros)
   3. Remembering to transpose free_* and *_free between freeing a single node 
and freeing a sequence.
   4. Remembering whether or not a given cleanup function could cope with 
NULL's or not
   5. The fact that there wasn't a consistent "goto error" exception-alike 
mechanism in use

(I had a Spanish Inquisition-esque experience writing that list ;)

Simply switching to PyObjects would solve the first four problems: everything 
becomes a Py_XDECREF.

Declaring that none of the AST node creation methods steal references would be 
consistent with most of the existing C API (e.g. PySequence_SetItem, 
PySequence_Tuple, PySequence_List), and has nice properties if we handle AST 
nodes as borrowed references from a PyList used as the arena, as Fredrik 
suggested.

If the top level function refrains from putting the top level node in the 
arena, then it will all "just work" - any objects will be deleted only if both 
the PyList arena AND the top-level node object are DECREF'ed. The top-level 
function only has to obey two simple rules:
   1. Always DECREF the arena list
   2. On success, INCREF the top-level node BEFORE DECREF'ing the arena list 
(otherwise Step 1 kills everything. . .)

To make the code a little more self-documenting, Fredrik's _PyList_APPEND 
could be called "new_ast_node" and accept the compiling struct directly:

PyObject*
new_ast_node(struct compiling *c, PyObject *ast_node)
{
     int n;
     if (!ast_node)
         return ast_node;
     idx = PyList_GET_SIZE(c->arena);
     if (idx == INT_MAX) {
         PyErr_SetString(PyExc_OverflowError,
         "cannot add more objects to arena");
         return NULL;
     }
     if (list_resize(c->arena, idx+1) == -1)
         return NULL;
     PyList_SET_ITEM(c->arena, idx, ast_node);
     return ast_node;
}

We'd also need to modify the helper macro for identifiers:

#define NEW_IDENTIFER(c, n) \
   new_ast_node(c, PyString_InternFromString(STR(n)))

Then the function is only borrowing the arena's reference, and doesn't need to 
decref anything:

static PyObject*
ast_for_funcdef(struct compiling *c, const node *n)
{
      /* funcdef: [decorators] 'def' NAME parameters ':' suite */
      PyObject *name = NULL;
      PyObject *args = NULL;
      PyObject *body = NULL;
      PyObject *decorator_seq = NULL;
      int name_i;

      REQ(n, funcdef);

      if (NCH(n) == 6) { /* decorators are present */
	decorator_seq = ast_for_decorators(c, CHILD(n, 0));
	if (!decorator_seq)
	    return NULL;
	name_i = 2;
      }
      else {
	name_i = 1;
      }

      name = NEW_IDENTIFIER(c, CHILD(n, name_i));
      if (!name)
	    return NULL;
      else if (!strcmp(STR(CHILD(n, name_i)), "None")) {
	ast_error(CHILD(n, name_i), "assignment to None");
         return NULL;
      }
      args = ast_for_arguments(c, CHILD(n, name_i + 1));
      if (!args)
         return NULL;
      body = ast_for_suite(c, CHILD(n, name_i + 3));
      if (!body)
         return NULL;

      return new_ast_node(\
        FunctionDef(name, args, body, decorator_seq, LINENO(n)));
}

No need for a checker, because there isn't anything special to do at the call 
sites: each AST node can take care of putting *itself* in the arena.

And as the identifier example shows, this even works for the non-AST leaf 
nodes that are some other kind of PyObject.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list