I am writing a Python procedural language implementation for PostgreSQL(embedding)(Use CVS -tOPYMEM; http://gborg.postgresql.org/project/postgrespy/), and I have need for free'ing all memory allocated by Python when a Postgres ERROR occurs as there is a potential of longjmp'ing out of the interpreter, which would leak memory all over the floor. Actually, most places within the plpy library have memory allocated by Python has a chance of leaking.(Yes, there is a way to work around this, but it is ugly, IMO. See the small paragraph on how this is normally handled by PLs at the end of this letter) Ideally, Python would allocate memory within a Postgres MemoryContext, so that when an ERROR occurs Python's memory can be freed up in the same way as other Postgres allocations. This can be achieved by making Python's low-level object [rd]e-allocators use three mutable, global function pointers.
In my test implementation I added these FPs to Objects/obmalloc.c
void *(*PyMemInternal_Reallocate)(void *, size_t) = realloc; void *(*PyMemInternal_Allocate)(size_t) = malloc; void (*PyMemInternal_Free)(void *) = free;
And, of course, then replace direct calls to malloc, realloc, and free in obmalloc.c and pymem.h with their global fp counterparts, and extern the declarations in pymem.h as well(not necessary, but seemed appropriate). Overloading the base type's tp_alloc and tp_free does not seem to be a complete option as many builtin types specify tp_alloc and tp_free as good ol' PyObject_Malloc/Del/etc(at least stringobject.c, IIRC), or with GC_* functions, and that does not cover direct calls to PyObject_*alloc|free anyways. I hear that there are some linker hacks that may be able to emulate this, but portability is very desirable as there are some PostgreSQL developers working on native Windows support.
The main problem that I can see with this request is that my use may be a special case, which few embedders would ever need to use.
Another possible solution is a function in obmalloc.c that iterates through the arenas and frees them up, this seems like it would be more likely to be accepted, but the former solution is more desirable for my usage.
Kaboom Of course, just freeing up all the memory--either resetting the memory context that Python memory is allocated under or free'ing up the arenas--leaves the Python library in an unusable state, even if done after Py_Finalize(yes, I've tested this, dangling globals and states(especially in obmalloc.c, IIUC)). To do this without restarting the process or re-dlinking(My chosen solution for my application), it would require Py_Finalize to completely reset libpython; that is, doing complete finalization(resetting libpython to its pre-Py_Initialize state). This seems a rather large request, and probably beyond anything that anyone is willing to do for the rarely used result(?)(I'm not too eager to jump on it, if it's even acceptable, but if nobody has a problem with me tackling it, I may look into it).
plpy I plan to add support for reloading dll's in Postgres to make this work properly for my app(closing and opening the lib should reset the globals, no? I haven't tested this yet, but I'm fairly confident that it is at least a reasonable assumption.). I think reload on ERROR would be a useful feature for lib authors, so I plan submit a proposal to pgsql-hackers soon, depending on what I am able to work out here..
Jumping Normally, PL's use *sigjmp fun to clean up this kind of memory, but it is a serious pain for me in plpy. Trapping these potential jumps must be done within every function with Python memory allocations that makes a call that may ERROR out(There are lots of them, especially in my Postgres interface module(the "if" CVS repository).
You can get my patch against 2.3.3 that implements those global FPs at rhid.com/pymfp.patch, it is pretty trivial. Doesn't touch the thread*.c or strdup.c as I didn't think it really applied to it so much, but perhaps they should be updated as well.
Comments, Criticisms, Flames?
Regards, James William Pye