[Patches] Re: Garbage collection patches for Python

nascheme@enme.ucalgary.ca nascheme@enme.ucalgary.ca
Wed, 9 Feb 2000 17:14:19 -0700


On Wed, Feb 09, 2000 at 03:59:48PM +0100, Vladimir Marangozov wrote:
> The goal is to remove Python's dependency on the standard POSIX interface
> (malloc/realloc/free) so that we can cleanly and easily plug in the future
> a "proprietary" mem manager, other than the one in the C library. For this
> purpose, the Python core should be patched and "cleaned" to use one or more
> of the following APIs:
[...]

Shouldn't all these be based on the same malloc?  We could
define everything in terms of PyMem_MALLOC, PyMem_REALLOC, and
PyMem_FREE if that makes things clearer.

> Every chunk of memory must be manupulated via the same malloc family.

Yes, and this is where things get tricky.  Extension modules can
use malloc to allocate objects and pass them to the Python core.
Python uses PyMem_FREE or similar and *boom*, memory corruption
(if they are different mallocs).

> That is, if one gets some piece of mem through PyMem_MALLOC (1),
> s/he must release that memory with PyMem_FREE (1). Accordingly, if one
> gets a chunk via PyMem_MALLOC (1), that chunk *should not* be released
> with PyMem_DEL (2).  (which is what Neil's patch does, not to mention
> that (2) is not defined in terms of (1).

I don't think we want more than one malloc within Python.  IMHO,
it would be impossible to keep the calls straight.  Why would we
want PyMem_MALLOC to use a different malloc than PyMem_NEW?

> 1) pypcre.c
> 
> This one is "buggy" for sure. The PCRE code allocates memory through
> the proprietary functions "pcre_malloc/pcre_free" (which default to
> malloc/free), so I really don't see why there's code inside mentioning
> "free" and not "pcre_free".

No, I think it is okay.  All the memory allocated with
pcre_malloc is deallocated with pcre_free.  The places that use
free haved been changed to PyMem_FREE because that memory has
come from the Python interpeter.  I spent some time tracking this
one down.

The fact that you are sure this is buggy shows how tricky this
business is.  Of course, I could always be wrong too.  :)

> 3) readline.c
> 
> Neil, what's this? Could you elaborate on this one?

Its very ugly.  Readline returns memory allocated by malloc.
That memory is eventually freed from within the interpreter by
PyMem_FREE.  We can't change the PyMem_FREE call to free because
it also frees memory allocated by PyMem_MALLOC.

> Note that introducing PyMem_MALLOC would constitute an additional
> constraint for C Python coders, who're used to malloc/free.

It has to be assumed that C Python coders use PyMem_MALLOC when
passing memory to Python and PyMem_FREE when deleting memory from
the interpeter.  Not a very good assumption I know.  That is why
real garbage collection is so hard to add to Python.

I have already tested this patch with a different PyMem_MALLOC
and malloc (the libc malloc and the Boehm GC malloc).  I spent a
long time looking a core dumps and straightening out the memory
calls.  It is possible that I missed some "API mixing" that my
tests did not cover however.

Note that I am working on the garbage collection scheme proposed
by Toby Kelsey and Tim Peters.  Right now things are looking
promising.  The cost seems to be quite low.  It can find almost
all reference cycles that would occur in real programs.  Finally,
it should be completely portable and does not require the use of
a different malloc.  I will be posting the code to the Python
patches list shortly.


    Neil