[Python-Dev] Memory management in the AST parser & compiler
Nick Coghlan
ncoghlan at gmail.com
Wed Nov 16 14:11:02 CET 2005
Thomas Lee wrote:
> As the writer of the crappy code that sparked this conversation, I feel
> I should say something :)
Don't feel bad about it. It turned out the 'helpful' review comments from Neal
and I didn't originally work out very well either ;)
With the AST compiler being so new, this is the first serious attempt to
introduce modifications based on it. It's already better than the old CST
compiler, but that memory management in the parser is a cow :)
>> Hopefully this is all made some sense. =) Is this the basic strategy
>> that an arena setup would need? if not can someone enlighten me?
I think we need to be explicit about the problems we're trying to solve before
deciding on what kind of solution we want :)
1. Cleaning up after failures in symtable.c and compile.c
It turns out this is already dealt with in the case of code blocks - the
compiler state handles a linked list of blocks which it automatically frees
when the compiler state is cleaned up.
So the only rule that needs to be followed in these files is to *never*
call any of the VISIT_* macros while there is a Python object which requires
DECREF'ing, or a C pointer which needs to be freed.
This rule was being broken in a couple of places in compile.c (with respect
to strings). I was the offender in both cases I found - the errors date from
when this was still on the ast-branch in CVS.
I've fixed those errors in SVN, and added a note to the comment at the top
of compile.c, to help others avoid making the same mistake I did.
It's fragile in some ways, but it does work. It makes the actual
compilation code look clean (because there isn't any cleanup code), but it
also makes that code look *wrong* (because the lack of cleanup code makes the
calls to "compiler_new_block" look unbalanced), which is a little disconcerting.
2. Parsing a token stream into the AST in ast.c
This is the bit that has caused Thomas grief (the PEP 341 patch only needs
to modify the front end parser). When building an AST node, each of the
contained AST nodes or sequences has to be built first. That means that, if
there's a problem with any of the later subnodes, the earlier subnodes need to
be freed.
The key problem with memory management in this module is that the free
method to be invoked is dependent on the nature of the AST node to be freed.
In the case of a node sequence, it is dependent on the nature of the contained
elements.
So not only do you have to remember to free the memory, you have to
remember to free it the *right way*.
Would it be worth the extra memory needed to store a pointer to an AST node's
"free" method in the AST type structure itself? And do the same for ASDL
sequences?
Then a simple FREE_AST macro would be able to "do the right thing" when it
came to freeing either AST nodes or sequences. In particular, ASDL sequences
would be able to free their contents without knowing what those contents
actually are.
That wouldn't eliminate the problem with memory leaks or double-deletion, but
it would eliminate some of the mental overhead of dealing with figuring out
which freeing function to invoke.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://boredomandlaziness.blogspot.com
More information about the Python-Dev
mailing list