[Python-Dev] Debug entry points for PyMalloc

Tim Peters tim.one@comcast.net
Sat, 23 Mar 2002 18:59:59 -0500


[Michael Hudson]
>>> Yes.  Particularly if you can call it from gdb.

[Tim]
>> Is something extraordinary required to make that possible?  I
>> had in mind nothing fancier than
>>
>> extern void _PyMalloc_DebugCheckAddress(void* p);

That grew a teensy bit fancier:  the arg changed to const.  A void
_PyMalloc_DebugCheckAddress(const void *p) entry also sprouted, to display
info about the memory block p, to stderr.  It should really go somewhere
else on Windows, but too little bang for the buck for me to bother
complicating it more.

[Michael]
> Dunno.  I ought to learn how to use gdb properly.

Let me know if you hit a snag.  They're simple enough that I can get away
with calling them in an MSVC "watch window", and I sure hope gdb isn't
feebler than that <wink>.

[Aahz]
> I'm almost certainly betraying my ignorance here, but it sounds to
> me like malloc isn't doing any sanity checking to make sure that the
> memory it received isn't already being used.

Well, malloc doesn't receive memory, it allocates it, and
_PyMalloc_DebugMalloc just wraps somebody *else's* malloc.  It's not trying
to debug the platform malloc, it's trying to debug "the user's" (Python's)
use of the memory malloc returns.

> Should each PyDebugMalloc() walk through the list of used memory?

There isn't any list for it to walk -- it's not an allocator, it's a wrapper
around somebody's else's allocator, and has no knowledge of how the
allocator(s) it calls work (beyond assuming that they meet the C defns of
how malloc() & friends must behave).

One thing we *could* do, but I'll leave it to someone else:  in a debug
build, Python maintains a linked list (in the C sense, not the Python sense)
of "almost all" live objects.  Walking that list and calling
_PyMalloc_DebugCheckAddress() on each object should detect "almost all"
out-of-bounds stores that may have happened since the last time that was
done.  That first requires a way to make all calls to all allocators funnel
thru the debug malloc wrappers (the code right now only wraps calls to the
pymalloc allocator).

[Skip Montanaro]
> Any possibility that the __LINE__ or __FILE__:__LINE__ at which a
> chunk of memory was freed could be imprinted as ASCII in freed memory
> without changing the API?

Which API?  Regardless of the answer <wink>, I'm not yet sure it's even
possible to get a little integer identifying the "API family" through the
macro layers correctly.  That's much more important to me, since it
addresses a practical widespread problem.  If you want to redesign the
macros to make that possible, I expect it would also make passing anything
else thru the macros possible too <hint>.

> I'd find something like
>
>     <0340><0340><0340><0340><0340>
>
> or
>
>     <object.c:0340><object.c:0340>
>
> more useful than a string of
>
>     0xDB0xDB0xDB0xDB0xDB0xDB0xDB0xDB
>
> bytes.

I'm not sure that I would.  The advantage of 0xdbdbdbdbdb... is two-fold:

1. In a debugger, the 0xdbdbdb... stuff stands out like an inflamed boil.
   The second or third time you see it happen in real life, concluding
   "ah, this PyObject* was already freed!" becomes automatic.

2. 0xdbdbdbdb is very likely an invalid memory address, so that, e.g.,
   attempting to do op->ob_type on an already-freed PyObject* op is
   very likely to trigger a memory error.

We may be able to get the best of both worlds by storing ASCII in the tail
end of freed memory (for whatever reason, vital pointers tend to show up
near the start of a struct).

As-is, the "serial number" of the block is left behind, so you can determine
which call to malloc created (or call to realloc last changed) the block.
Then in a second run, you can set a counting breakpoint to trigger on that
call to the debug malloc/realloc, and after that triggers set a conditional
breakpoint in the debug free (to break when the memory address is free'd).
Then you'll catch the free at the time it occurs.  There are ways this can
fail to work <wink>.

> I did something similar in a small single-file library I've been
> working on, though I didn't pay much attention to preserving the
> malloc/free API because, like I said, it was something small.  I
> simply changed all free() calls to something like
>
>     MARK_TERRITORY(s, strlen(s), __LINE__);
>     free(s);
>
> (The second arg was appropriate to the size of the memory chunk being
> freed.)

We can make the new-in-2.3 _PyMalloc_XXX calls do anything you can dream up,
but my time on this has been of the "do a right thing and apologize later"
flavor.  Whoever wants substantially more is going to have to do most of the
work to get it.