[Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

Mon Jan 3 23:40:37 CET 2005

On Jan 3, 2005, at 4:49 PM, Tim Peters wrote:

> [Tim Peters]
>>> Ya, I understood that.  My conclusion was that Darwin's realloc()
>>> implementation isn't production-quality.  So it goes.
>
> [Bob Ippolito]
>> Whatever that means.
>
> Well, it means what it said.  The C standard says nothing about
> performance metrics of any kind, and a production-quality
> implementation of C requires very much more than just meeting what the
> standard requires.  The phrase "quality of implementation" is used in
> the C Rationale (but not in the standard proper) to cover all such
> issues.  realloc() pragmatics are quality-of-implementation issues;
> the accuracy of fp arithmetic is another (e.g., if you get back -666.0
> from the C 1.0 + 2.0, there's nothing in the standard to justify a
> complaint).
>
>>>>  free() can be called either explicitly, or implicitly by calling
>>>> realloc() with a size larger than the size of the allocation.
>
> From later comments feigning outrage <wink>, I take it that "the size
> of the allocation" here does not mean the specific number the user
> passed to the previous malloc/realloc call, but means whatever amount
> of address space the implementation decided to use internally.  Sorry,
> but I assumed it meant the former at first.

Sorry for the confusion.

>>>> Was this a good decision?  Probably not!
>
>>> Sounds more like a bug (or two) to me than "a decision", but I don't
>>> know.
>
>> You said yourself that it is standards compliant ;)  I have filed it 
>> as
>> a bug, but it is probably unlikely to be backported to current 
>> versions
>> of Mac OS X unless a case can be made that it is indeed a security
>> flaw.
>
> That's plausible.  If you showed me a case where Python's list.sort()
> took cubic time, I'd certainly consider that to be "a bug", despite
> that nothing promises better behavior.  If I wrote a malloc subsystem
> and somebody pointed out "did you know that when I malloc 1024**2+1
> bytes, and then realloc(1), I lose the other megabyte forever?", I'd
> consider that to be "a bug" too (because, docs be damned, I wouldn't
> intentionally design a malloc subsystem with such behavior; and
> pymalloc does in fact copy bytes on a shrinking realloc in blocks it
> controls, whenever at least a quarter of the space is given back --
> and it didn't at the start, and I considered that to be "a bug" when
> it was pointed out).

I wouldn't equate "until free() is called" with "forever".  But yes, I 
consider it a bug just as you do, and have reported it appropriately.  
Practically, since it exists in Mac OS X 10.2 and Mac OS X 10.3, and 
may not ever be fixed, we should at least consider it.

>> ...
>> Known case?  No.  Do I want to search Python application-space to find
>> one?  No.
>
> Serious problems on a platform are usually well-known to users on that
> platform.  For example, it was well-known that Python's list-growing
> strategy as of a few years ago fragmented address space horribly on
> Win9X.  This was a C quality-of-implementation issue specific to that
> platform.  It was eventually resolved by improving the list-growing
> strategy on all platforms -- although it's still the case that Win9X
> does worse on list-growing than other platforms, it's no longer a
> disaster for most list-growing apps on Win9X.

It does take a long time to figure such weird behavior out though.  I 
would have to guess that most people Python users on Darwin have been 
at it for less than 3 years.

The number of people using Python on Darwin who have have written or 
used code that exercised this scenario are determined enough to track 
this sort of thing down is probably very small.

> If there's a problem with "overallocate then realloc() to cut back" on
> Darwin that affects many apps, then I'd expect Darwin users to know
> about that already -- lots of people have used Python on Macs since
> Python's beginning, "mysterious slowdowns" and "mysterious bloat" get
> noticed, and Darwin has been around for a while.

Most people on Mac OS X have a lot of memory, and Mac OS X generally 
does a good job about swapping in and out without causing much of a 
problem, so I'm personally not very surprised that it could go 
unnoticed this long.

Google says:
Results 1 - 10 of about 1,150 for (darwin OR Mac OR "OS X") AND 
MemoryError AND Python.
Results 1 - 10 of about 942 for malloc vm_allocate failed. (0.73 
seconds) 

Of course, in both cases, not all of these can be attributed to 
realloc()'s implementation, but I'm sure some of them can, especially 
the Python ones!

>> They're [#ifdef's] also the only good way to deal with 
>> platform-specific
>> inconsistencies.  In this specific case, it's not even possible to
>> determine if a particular allocator implementation is stupid or not
>> without at least using a platform-allocator-specific function to query
>> the size reserved by a given allocation.
>
> We've had bad experience on several platforms when passing large
> numbers to recv().  If that were addressed, it's unclear that Darwin
> realloc() behavior would remain a real issue.  OTOH, it is clear that
> *just* worming around Darwin realloc() behavior won't help other
> platforms with problems in the same *immediate* area of bug 1092502.
> Gross over-allocation followed by a shrinking realloc() just isn't
> common in Python.  sock_recv() is an exceptionally bad case.  More
> typical is, e.g., fileobject.c's get_line(), where if "a line" exceed
> 100 characters the buffer keeps growing by 25% until there's enough
> room, then it's cut back once at the end.  That typical use for
> shrinking realloc() just isn't going to be implicated in a real
> problem -- the over-allocation is always minor.

What about for list objects that are big at some point, then 
progressively shrink, but happen to stick around for a while?  An 
"event queue" that got clogged for some reason and then became stable?  
Dictionaries?  Of course these potential problems are a lot less likely 
to happen.

>> ...
>> There's obviously a tradeoff between copying lots of bytes and having
>> lots of memory go to waste.  That should be taken into consideration
>> when considering how many pages could be returned to the allocator.
>> Note that we can ask the allocator how much memory an allocation has
>> actually reserved (which is usually somewhat larger than the amount 
>> you
>> asked it for) and how much memory an allocation will reserve for a
>> given size.  An allocation resize wouldn't even show up as smaller
>> unless at least one page would be freed (for sufficiently large
>> allocations anyway, the minimum granularity is 16 bytes because it
>> guarantees that alignment).  Obviously if you have a lot of pages
>> anyway, one page isn't a big deal, so we would probably only resort to
>> free()/memcpy() if some fair percentage of the total pages used by the
>> allocation could be rescued.
>>
>> If it does end up causing some real performance problems anyway,
>> there's always deeper hacks like using vm_copy(), a Darwin specific
>> function which will do copy-on-write instead (which only makes sense 
>> if
>> the allocation is big enough for this to actually be a performance
>> improvement).
>
> As above, I'm skeptical that there's a general problem worth
> addressing here, and am still under the possible illusion that the Mac
> developers will eventually change their realloc()'s behavior anyway.
> If you're convinced it's worth the bother, go for it.  If you do, I
> strongly hope that it keys off a new platform-neutral symbol (say,
> Py_SHRINKING_REALLOC_COPIES) and avoids Darwin-specific implementation
> code.  Then if it turns out that it is a broad problem (across apps or
> across platforms), everyone can benefit.  PyObject_Realloc() seems the
> best place to put it.  Unfortunately, for blocks obtained from the
> system malloc(), there is no portable way to find out how much excess
> was allocated in a release-build Python, so "avoids Darwin-specific
> implementation code" may be impossible to achieve.  The more it
> *can't* be used on any platform other than this flavor of Darwin, the
> more inclined I am to advise just fixing the immediate problem
> (sock_recv's potentially unbounded over-allocation).

I'm pretty sure this kind of malloc functionality is very specific to 
Darwin and does not carry over to any other BSD.  In order for an 
intelligent implementation, an equivalent of malloc_size() and 
malloc_good_size() is required.  Unfortunately, despite the man page, 
malloc_good_size() is not declared in <malloc/malloc.h>, however there 
is another, declared, way to get at that functionality (by poking into 
the malloc_introspection_t struct of the malloc_default_zone()).

-bob