[Python-Dev] pymalloc killer

Fri, 29 Mar 2002 13:50:54 -0500

[martin@v.loewis.de]
> I would not like to see such a binary search performed.

Duh <wink>.

> Instead, if anything needs to be done, I'd be in favour of using a
> constant-time algorithm, even if it means that a littler more memory
> overhead is necessary.
>
> I have the following idea: each chunk allocated by ought
                                                    ^
Allocated by what?  Some crucial words are missing here.  Note that I am
*not* proposing to change PyMem_{MALLOC, Malloc):  that continues to map
directly to system malloc(), so we have no control over, nor even knowledge
of, the memory allocated by PyMem_{MALLOC, Malloc}.  We can't redirect those
to pymalloc's malloc without introducing locks to prevent multithreaded
insanity, or "declaring by fiat" that existing working code is now horribly
broken in ways we can't help users detect.

> to have a pointer to its pool, immediately preceding the memory block.

In what fundamental way does this differ from pymalloc's current scheme of
storing the pool address at an address calculated *from* the pointer
address?  I suspect the real difference resides in the next point:

> This will make an overhead of 4 bytes per allocation. Chunks allocated
> by the system allocator will have a null pointer preceding them.

That's the killer.  We have no control over "the system allocator", by which
I mean libc's malloc() and realloc().  As above, trying to hijack them is
unattractive due to thread issues, even for just the PyMem_{Malloc, MALLOC,
Realloc, REALLOC, New, NEW, Resize, RESIZE} spellings.

> To deal with alignment, the size classes would increase each by 4
> bytes, so they would be spaced at 12, 20, 28, etc. bytes. With the 4
> byte overallocation, each chunk would be 8-aligned, if the previous
> chunk in the same pool is.

*If* we take over the system malloc, we need also to mimic the platform
malloc's alignment.  Pymalloc's object allocator can get away with 8-byte
alignment because the core never requires worse-than-double alignment, and
I'm willing to say that this is a language implementation restriction.  I
don't believe we can also restrict PyMem_umpteen_ways_to_spell_get_memory
that way:  they have always given platform-malloc alignment.  If we could, I
agree your scheme is fine, although I'm not sure why it stores a whole
pointer when it's making a 1-bit distinction ("does or does not reside in a
pymalloc pool").

> This approach would allow to remove the requirement that each pool
> must be 4k-aligned.

OK, that's why it really stores a whole pointer.  Agreed then, but the
memory lost to achieve page alignment once per-arena is likely small
compared to adding 4 bytes to every allocation unit.  The former loses at
most 4KB per 256KB.

> To support the NULL pointer in a system-malloc'ed chunk, pymalloc
> would overallocate 8 bytes if it defers to system malloc, to preserve
> alignment.
>
> What do you think?

Primarily that hijacking the system malloc is fraught with new difficulties.
If we *can* hijack it, so that pymalloc's free always knows that addresses
passed to it came from pymalloc's malloc/realloc originally, then I agree
it's easy to do something cheap to distinguish pool from non-pool memory.
The scheme I sketched is as hairy as it is exactly because it does the right
thing even when freeing an address obtained directly from the platform
malloc/realloc.