[Python-Dev] Memory management in the AST parser & compiler
Nick Coghlan
ncoghlan at gmail.com
Tue Nov 29 13:59:52 CET 2005
Neal Norwitz wrote:
> On 11/28/05, "Martin v. Löwis" <martin at v.loewis.de> wrote:
>> Neal Norwitz wrote:
>>> Hope this helps explain a bit. Please speak up with how this can be
>>> improved. Gotta run.
>> I would rewrite it as
>
> [code snipped]
>
> For those watching, Greg's and Martin's version were almost the same.
> However, Greg's version left in the memory leak, while Martin fixed it
> by letting the result fall through. Martin added some helpful rules
> about dealing with the memory. Martin also gets bonus points for
> talking about developing a checker. :-)
>
> In both cases, their modified code is similar to the existing AST
> code, but all deallocation is done with Py_[X]DECREFs rather than a
> type specific deallocator. Definitely nicer than the current
> situation. It's also the same as the rest of the python code.
When working on the CST->AST parser, there were only a few things I found to
be seriously painful about the memory management:
1. Remembering which free_* variant to call for AST nodes
2. Remembering which asdl_seq_*_free variant to call for ASDL sequences (it
was worse when the variant I wanted didn't exist, since this was done with
functions rather than preprocessor macros)
3. Remembering to transpose free_* and *_free between freeing a single node
and freeing a sequence.
4. Remembering whether or not a given cleanup function could cope with
NULL's or not
5. The fact that there wasn't a consistent "goto error" exception-alike
mechanism in use
(I had a Spanish Inquisition-esque experience writing that list ;)
Simply switching to PyObjects would solve the first four problems: everything
becomes a Py_XDECREF.
Declaring that none of the AST node creation methods steal references would be
consistent with most of the existing C API (e.g. PySequence_SetItem,
PySequence_Tuple, PySequence_List), and has nice properties if we handle AST
nodes as borrowed references from a PyList used as the arena, as Fredrik
suggested.
If the top level function refrains from putting the top level node in the
arena, then it will all "just work" - any objects will be deleted only if both
the PyList arena AND the top-level node object are DECREF'ed. The top-level
function only has to obey two simple rules:
1. Always DECREF the arena list
2. On success, INCREF the top-level node BEFORE DECREF'ing the arena list
(otherwise Step 1 kills everything. . .)
To make the code a little more self-documenting, Fredrik's _PyList_APPEND
could be called "new_ast_node" and accept the compiling struct directly:
PyObject*
new_ast_node(struct compiling *c, PyObject *ast_node)
{
int n;
if (!ast_node)
return ast_node;
idx = PyList_GET_SIZE(c->arena);
if (idx == INT_MAX) {
PyErr_SetString(PyExc_OverflowError,
"cannot add more objects to arena");
return NULL;
}
if (list_resize(c->arena, idx+1) == -1)
return NULL;
PyList_SET_ITEM(c->arena, idx, ast_node);
return ast_node;
}
We'd also need to modify the helper macro for identifiers:
#define NEW_IDENTIFER(c, n) \
new_ast_node(c, PyString_InternFromString(STR(n)))
Then the function is only borrowing the arena's reference, and doesn't need to
decref anything:
static PyObject*
ast_for_funcdef(struct compiling *c, const node *n)
{
/* funcdef: [decorators] 'def' NAME parameters ':' suite */
PyObject *name = NULL;
PyObject *args = NULL;
PyObject *body = NULL;
PyObject *decorator_seq = NULL;
int name_i;
REQ(n, funcdef);
if (NCH(n) == 6) { /* decorators are present */
decorator_seq = ast_for_decorators(c, CHILD(n, 0));
if (!decorator_seq)
return NULL;
name_i = 2;
}
else {
name_i = 1;
}
name = NEW_IDENTIFIER(c, CHILD(n, name_i));
if (!name)
return NULL;
else if (!strcmp(STR(CHILD(n, name_i)), "None")) {
ast_error(CHILD(n, name_i), "assignment to None");
return NULL;
}
args = ast_for_arguments(c, CHILD(n, name_i + 1));
if (!args)
return NULL;
body = ast_for_suite(c, CHILD(n, name_i + 3));
if (!body)
return NULL;
return new_ast_node(\
FunctionDef(name, args, body, decorator_seq, LINENO(n)));
}
No need for a checker, because there isn't anything special to do at the call
sites: each AST node can take care of putting *itself* in the arena.
And as the identifier example shows, this even works for the non-AST leaf
nodes that are some other kind of PyObject.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-Dev
mailing list