[Python-Dev] Memory management in the AST parser & compiler

Wed Nov 16 14:11:02 CET 2005

Thomas Lee wrote:
> As the writer of the crappy code that sparked this conversation, I feel 
> I should say something :)

Don't feel bad about it. It turned out the 'helpful' review comments from Neal 
and I didn't originally work out very well either ;)

With the AST compiler being so new, this is the first serious attempt to 
introduce modifications based on it. It's already better than the old CST 
compiler, but that memory management in the parser is a cow :)

>> Hopefully this is all made some sense.  =)  Is this the basic strategy
>> that an arena setup would need?  if not can someone enlighten me?

I think we need to be explicit about the problems we're trying to solve before 
deciding on what kind of solution we want :)

1. Cleaning up after failures in symtable.c and compile.c
   It turns out this is already dealt with in the case of code blocks - the 
compiler state handles a linked list of blocks which it automatically frees 
when the compiler state is cleaned up.
   So the only rule that needs to be followed in these files is to *never* 
call any of the VISIT_* macros while there is a Python object which requires 
DECREF'ing, or a C pointer which needs to be freed.
   This rule was being broken in a couple of places in compile.c (with respect 
to strings). I was the offender in both cases I found - the errors date from 
when this was still on the ast-branch in CVS.
   I've fixed those errors in SVN, and added a note to the comment at the top 
of compile.c, to help others avoid making the same mistake I did.
   It's fragile in some ways, but it does work. It makes the actual 
compilation code look clean (because there isn't any cleanup code), but it 
also makes that code look *wrong* (because the lack of cleanup code makes the 
calls to "compiler_new_block" look unbalanced), which is a little disconcerting.

2. Parsing a token stream into the AST in ast.c
   This is the bit that has caused Thomas grief (the PEP 341 patch only needs 
to modify the front end parser). When building an AST node, each of the 
contained AST nodes or sequences has to be built first. That means that, if 
there's a problem with any of the later subnodes, the earlier subnodes need to 
be freed.
   The key problem with memory management in this module is that the free 
method to be invoked is dependent on the nature of the AST node to be freed. 
In the case of a node sequence, it is dependent on the nature of the contained 
elements.
   So not only do you have to remember to free the memory, you have to 
remember to free it the *right way*.

Would it be worth the extra memory needed to store a pointer to an AST node's 
"free" method in the AST type structure itself? And do the same for ASDL 
sequences?

Then a simple FREE_AST macro would be able to "do the right thing" when it 
came to freeing either AST nodes or sequences. In particular, ASDL sequences 
would be able to free their contents without knowing what those contents 
actually are.

That wouldn't eliminate the problem with memory leaks or double-deletion, but 
it would eliminate some of the mental overhead of dealing with figuring out 
which freeing function to invoke.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://boredomandlaziness.blogspot.com