[Python-Dev] Memory management in the AST parser & compiler

Thomas Lee krumms at gmail.com
Wed Nov 16 10:49:50 CET 2005

As the writer of the crappy code that sparked this conversation, I feel 
I should say something :)

Brett Cannon wrote:

>On 11/15/05, Neal Norwitz <nnorwitz at gmail.com> wrote:
>>On 11/15/05, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>Thanks for the message.  I was going to suggest the same thing.  I
>>>think it's primarily a question of how to add an arena layer.  The AST
>>>phase has a mixture of malloc/free and Python object allocation.  It
>>>should be straightforward to change the malloc/free code to use an
>>>arena API.  We'd probably need a separate mechanism to associate a set
>>>of PyObject* with the arena and have those DECREFed.
>>Well good.  It seems we all agree there is a problem and on the
>>general solution.  I haven't thought about Brett's idea to see if it
>>could work or not.  It would be great if we had someone start working
>>to improve the situation.  It could well be that we live with the
>>current code for 2.5, but it would be great to use arenas for 2.6 at
> I have been thinking about this some more  to put off doing homework
>and I have some random ideas I just wanted to toss out there to make
>sure I am not thinking about arena memory management incorrectly
>(never actually encountered it directly before).
>I think an arena API is going to be the best solution.  Pulling
>trickery with redefining Py_INCREF and such like I suggested seems
>like a pain and possibly error-prone.  With the compiler being a
>specific corner of the core having a special API for handling the
>memory for PyObject* stuff seems reasonable.
I agree. And it raises the learning curve for poor saps like myself. :)

>We might need PyArena_Malloc() and PyArena_New() to handle malloc()
>and PyObject* creation.  We could then have a struct that just stored
>pointers to the allocated memory (linked list for each pointer which
>gives high memory overhead or linked list of arrays that should lower
>memory but make having possible holes in the array for stuff already
>freed a pain to handle).  We would then have PyArena_FreeAll() that
>would be strategically placed in the code for when bad things happen
>that would just traverse the lists and free everything.  I assume
>having a way to free individual items might be useful.  Could have the
>PyArena_New() and _Malloc() return structs with the needed info for a
>PyArena_Free(location_struct) to be able to fee the specific item
>without triggering a complete freeing of all memory.  But this usage
>should be discouraged and only used when proper memory management is
An arena/pool (as I understood it from my quick skim) for the AST would 
probably best be implemented (IMHO) as an ADT based on a linked-list:

typedef struct _ast_pool_node {
  struct _ast_pool_node *next;
  PyObject *object; /* == NULL when data != NULL */
  void *data; /* == NULL when object != NULL */

deallocating a node could then be as simple as:

/* ast_pool_node *n */
if (n->data != NULL)
/* save n->next */
/* then go on to free n->next */

I haven't really thought all that deeply about this, so somebody shoot 
me down if I'm completely off-base (Neal? :D). Every allocation of a 
seq/stmt within ast.c would have its memory saved to the pool within the 
function it's allocated in. Then before we return, we can just 
deallocate the pool/arena/whatever you want to call it.

The problem with this is that should we get to the end of the function 
and everything actually went okay (i.e. we return non-NULL), we then 
have to run through and deallocate all the nodes anyway (without 
deallocating n->object or n->data). Bah. Maybe we *would* be better off 
with a monolithic cleanup. I don't know.

>Boy am I wanting RAII from C++ for automatic freeing when scope is
>left.  Maybe we need to come up with a similar thing, like all memory
>that should be freed once a scope is left must use some special struct
>that stores references to all created memory locally and then a free
>call must be made at all exit points in the function using the special
>struct.  Otherwise the pointer is stored in the arena and handled
>en-mass later.
Which is basically what I just rambled on about up above, I think :)

>Hopefully this is all made some sense.  =)  Is this the basic strategy
>that an arena setup would need?  if not can someone enlighten me?
>Python-Dev mailing list
>Python-Dev at python.org
>Unsubscribe: http://mail.python.org/mailman/options/python-dev/krumms%40gmail.com

More information about the Python-Dev mailing list