[Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140

Tim Peters tim.one@comcast.net
Sun, 21 Apr 2002 15:55:48 -0400


[martin@v.loewis.de]
> ...
> I found the overallocations strategy that this code had just so
> embarrassing: a single three-byte character will cause an
> overallocation of three times the memory, which means that the final
> realloc will certainly truncate lots of bytes.

Well, Python routinely over-allocates string space all over the place, and
it doesn't really cost more to throw away a million bytes at the end than to
throw away five bytes.  That is, the cost of a realloc shrink is at worst
proportional to the number of retained bytes; the number of bytes thrown
away doesn't matter.  For that reason, the thing that puzzles me more about
the current state of the code is why it doesn't overallocate by a factor of
4 from the start, and skip all the delicate, repeated "oops! I still didn't
get enough memory" logic.  It's almost certainly going to do a shrinking
realloc at the end anyway.

> As a result, we are at the mercy of the realloc implementation here:
> If realloc copies memory (such as Pymalloc might some day) when
> shrinking buffers, performance will get worse.

Since pymalloc only handles small blocks by itself, such copies would be of
short contiguous memory regions that are almost certainly in L1 cache (since
the encoder function just finished crawling over them).  For that reason I
doubt the speed hit would be monstrous.

> Since this appears to be religious, I'm backing the patch out.

It needn't be religious <wink>:  with sufficient effort we can quantify the
tradeoff between the time pymalloc's realloc would take to move shrinking
blocks, and the space wasted by realloc leaving them in place.  I mentioned
before that I want to give pymalloc a new entry point for realloc callers
who specifically do or don't want a shrinking realloc to copy memory.  For
string storage I *suspect* it's better to move the memory (typically exactly
one shrinking realloc, after which the string storage is immutable until
death), while for most other kinds of storage it's better to leave the block
alone (mutable memory that may shrink and grow repeatedly over time).