[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)

Tim Peters tim.one@comcast.net
Fri, 26 Apr 2002 14:21:29 -0400

> The standard reasoning behind using overallocation for memory
> management is that typical modern malloc()s don't really allocate
> the memory until it is used (you know this anyway...),

> ...
> Most strings are smaller than a page, so if you allocate lots of
> strings, every page allocated from the system will be used as well
> (at least to fill in the string header). With overallocation, you will
> indeed overallocate real pages, and consume real memory.

But Marc-Andre uses realloc at the end to return the excess.  The excess
bytes will get reused (and some returned yet again) by the next
overallocation, and so on.  The only memory *touched* by him is the only
memory he actually needs in the end.  Indeed, if strings are always smaller
than a page, the aggregate overallocation at any point in a single-threaded
program will be at worst a few pages total (overallocation is never more
than a factor of 4, and the excess is always given back untouched).

>> As Martin's benchmark showed, the counting strategy is
>> faster for small chunks and this is probably due to the
>> fact that pymalloc manages these.

> I doubt this claim.

If you both run the test with and without pymalloc enabled, the results
should speak for themselves.  I have not, but suspect the difference has
more to do with that caches like small memory areas better than large ones,
and especially when you crawl over a memory area twice.

MAL, you should keep in mind that pymalloc is also managing the small chunks
in your scheme:  when you're fiddling with a 40-character Unicode string, an
overallocation "by a factor of 4" only amounts to an 80-character UTF8
string.  pymalloc handles blocks that small under either scheme, and indeed
your current scheme is getting a speed benefit from that pymalloc currently
refuses to copy the data to a smaller block when there's a shrinking realloc
at the end.

>> Since pymalloc cannot know that an algorithm wants to
>> use overallocation as memory allocation strategy, it
>> would probably help to find a way to tell pymalloc
>> about this fact.  It could then either redirect the
>> request to the system malloc() or use a different
>> malloc strategy for these chunks.

Possibly, yes.  It's still in need of quantifying the speed versus memory
tradeoffs.  pymalloc's current small-block realloc strategy favors speed,
and going to the system malloc instead would be slower.  You haven't yet
told me that's what you actually want <wink>.