[Patches] Re: Garbage collection patches for Python

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Wed, 9 Feb 2000 15:59:48 +0100 (CET)


nascheme@enme.ucalgary.ca wrote:
> 
> On Sat, Feb 05, 2000 at 03:00:59PM +0100, Vladimir Marangozov wrote:
> > Neil, could you please resubmit your 1st GC patch with the disclaimer,
> > so that we can focus on this 1st step, test the patch and fold it into
> > CVS?
> 
> No problem.  I've included my Boehm-Demers-Weiser garbage
> collector patch too as I have improved it to use autoconfig.  The
> gc patch is small compared to the malloc cleanup.

Thanks. After a closer look at the patch, I see a couple of problems
with it. In particular, it mixes different API families.

Let me start from the start:

The goal is to remove Python's dependency on the standard POSIX interface
(malloc/realloc/free) so that we can cleanly and easily plug in the future
a "proprietary" mem manager, other than the one in the C library. For this
purpose, the Python core should be patched and "cleaned" to use one or more
of the following APIs:

1) PyMem_MALLOC        2) PyMem_NEW   
   PyMem_REALLOC  ==>     PyMem_RESIZE
   PyMem_FREE             PyMem_DEL
                          PyMem_XDEL

   The proposed           Guido's augmented version of 1) which is
   raw mem interface.     more "Pythonic" and contains some safety
                          additions, like _PyMem_EXTRA). This one
   (in mymalloc.h)        should be defined in terms of 1).


3) Py_Malloc           4) PyMem_Malloc       5) _PyObject_New
   Py_Realloc             PyMem_Realloc          PyObject_NEW
   Py_Free                PyMem_Free             PyObject_NEW_VAR, ...

These are implemented using 1) and/or 2)

It seems to me that presently nobody uses the wrappers 3) and 4), (except
one single occurence of Py_Malloc, lost in _localemodule.c), because these
wrappers cause an additional func call overhead...

All these APIs are certainly redundant and the added value of each of
them has to be weighted one again, but for now, let's assume that we
have them all (without more argumentation  - we could throw away some of
them later).

<EXTRABOLD>

The rule of thumb in this situation is:

Every chunk of memory must be manupulated via the same malloc family.

</EXTRABOLD>

That is, if one gets some piece of mem through PyMem_MALLOC (1),
s/he must release that memory with PyMem_FREE (1). Accordingly, if one
gets a chunk via PyMem_MALLOC (1), that chunk *should not* be released
with PyMem_DEL (2).  (which is what Neil's patch does, not to mention
that (2) is not defined in terms of (1).

So we must be careful here. In particular, when patching a file, we must
figure out which family (-ies) is (are) used in order to perform the
"malloc cleanup" the right way. That is, we have to inspect every file
one by one, once again.

-----------

With the above theory in mind, I see that there are a couple of
"problematic" files in the current snapshot. I remember that I have
identified them with my allocator (pymalloc) either.

(I'll use some excerpts of Neil's patch to illustrate the problems
 I'm talking about)

1) pypcre.c

This one is "buggy" for sure. The PCRE code allocates memory through
the proprietary functions "pcre_malloc/pcre_free" (which default to
malloc/free), so I really don't see why there's code inside mentioning
"free" and not "pcre_free".

a) Would this code be dropped with the inclusion of /F's re engine in 1.6?
b) If the answer to a) is negative, I hope AMK & the String-SIG will help
   us on this (it's a hairy code :-).
   
> diff -cr Python-cvs/Modules/pypcre.c Python-gc/Modules/pypcre.c
> *** Python-cvs/Modules/pypcre.c	Sat Jul 17 12:13:10 1999
> --- Python-gc/Modules/pypcre.c	Sat Feb  5 09:00:47 2000
> ***************
> *** 3057,3068 ****
>   static int free_stack(match_data *md)
>   {
>   /* Free any stack space that was allocated by the call to match(). */
> ! if (md->off_num)    free(md->off_num); 
> ! if (md->offset_top) free(md->offset_top); 
> ! if (md->r1)         free(md->r1); 
> ! if (md->r2)         free(md->r2); 
> ! if (md->eptr)       free((char *)md->eptr); 
> ! if (md->ecode)      free((char *)md->ecode);
>   return 0;
>   }
>   
> --- 3057,3068 ----
>   static int free_stack(match_data *md)
>   {
>   /* Free any stack space that was allocated by the call to match(). */
> ! if (md->off_num)    PyMem_DEL(md->off_num); 
> ! if (md->offset_top) PyMem_DEL(md->offset_top); 
> ! if (md->r1)         PyMem_DEL(md->r1); 
> ! if (md->r2)         PyMem_DEL(md->r2); 
> ! if (md->eptr)       PyMem_DEL((char *)md->eptr); 
> ! if (md->ecode)      PyMem_DEL((char *)md->ecode);
>   return 0;
>   }

2) _tkinter.c

I remember from my experiments that this one was really a mess from a
malloc point of view because it caused lots of mixed API calls (once
the core allocator was changed).  I have to look at it carefully
once again...

3) readline.c

Neil, what's this? Could you elaborate on this one?

>   
> diff -cr Python-cvs/Modules/readline.c Python-gc/Modules/readline.c
> *** Python-cvs/Modules/readline.c	Sat Feb  5 09:00:14 2000
> --- Python-gc/Modules/readline.c	Sat Feb  5 09:00:47 2000
> ***************
> *** 41,46 ****
> --- 41,63 ----
>   extern int (*PyOS_InputHook)();
>   extern char *(*PyOS_ReadlineFunctionPointer) Py_PROTO((char *));
>   
> + /* convert a string allocated by malloc to one allocated by PyMem_MALLOC */
> + static char *
> + malloc2PyMem(char *p)
> + {
> + 	char *py;
> + 	int n;
> + 	if (p == NULL) {
> + 		return p;
> + 	} else {
> + 		n = strlen(p);
> + 		py = PyMem_MALLOC(n);
> + 		strncpy(py, p, n+1);
> + 		free(p);
> + 		return py;
> + 	}
> + }
> + 


All in all, this patch isn't acceptable for me and I suggest that we
prepare another one which respects "the rule", after demystifying
some of the issues mentioned above.

Note that introducing PyMem_MALLOC would constitute an additional
constraint for C Python coders, who're used to malloc/free.

In order to pave the way for alternatives to libc's malloc, we must
propose a clean solution. Hence it would be good to collect as much
opinions/reactions on the subject as possible and settle on an interface
which would be convenient for everybody.

-- 
       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252