[Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

Tim Peters tim.peters at gmail.com
Mon Jan 3 08:16:34 CET 2005


[Bob Ippolito]
> ...
> Your expectation is not correct for Darwin's memory allocation scheme.
> It seems that Darwin creates allocations of immutable size.  The only
> way ANY part of an allocation will ever be used by ANYTHING else is if
> free() is called with that allocation.

Ya, I understood that.  My conclusion was that Darwin's realloc()
implementation isn't production-quality.  So it goes.

>  free() can be called either explicitly, or implicitly by calling realloc() with
> a size larger than the size of the allocation.  In that case, it will create a new
> allocation of at least the requested size, copy the contents of the
> original allocation into the new allocation (probably with
> copy-on-write pages if it's large enough, so it might be cheap), and
> free() the allocation.

Really?  Another near-universal "quality of implementation"
expectation is that a growing realloc() will strive to extend
in-place.  Like realloc(malloc(1000000), 1000001).  For example, the
theoretical guarantee that one-at-a-time list.append() has amortized
linear time doesn't depend on that, but pragmatically it's greatly
helped by a reasonable growing realloc() implementation.

>  In the case where realloc() specifies a size that is not greater than the
> allocation's size, it will simply return the given allocation and cause no side-
> effects whatsoever.
>
> Was this a good decision?  Probably not!

Sounds more like a bug (or two) to me than "a decision", but I don't know.

>  However, it is our (in the "I know you use Windows but I am not the only
> one that uses Mac OS X sense) problem so long as Darwin is a supported
> platform, because it is highly unlikely that Apple will backport any "fix" to
> the allocator unless we can prove it has some security implications in
> software shipped with their OS. ...

Is there any known case where Python performs poorly on this OS, for
this reason, other than the "pass giant numbers to recv() and then
shrink the string because we didn't get anywhere near that many bytes"
case?  Claiming rampant performance problems should require evidence
too <wink>.

...
> Presumably this can happen at other places (including third party
> extensions), so a better place to do this might be _PyString_Resize().
> list_resize() is another reasonable place to put this.  I'm sure there
> are other places that use realloc() too, and the majority of them do
> this through obmalloc.  So maybe instead of trying to track down all
> the places where this can manifest, we should just "gunk up" Python and
> patch PyObject_Realloc()?

There is no "choke point" for allocations in Python -- some places
call the system realloc() directly.  Maybe the latter matter on Darwin
too, but maybe they don't.  The scope of this hack spreads if they do.
 I have no idea how often realloc() is called directly by 3rd-party
extension modules.  It's called directly a lot in Zope's C code, but
AFAICT only to grow vectors, never to shrink them.
'
> Since we are both pretty confident that other allocators aren't like Darwin,
> this "gunk" can be #ifdef'ed to the __APPLE__ case.

#ifdef's are a last resort:  they almost never go away, so they
complicate the code forever after, and typically stick around for
years even after the platform problems they intended to address have
been fixed.  For obvious reasons, they're also an endless source of
platform-specific bugs.

Note that pymalloc already does a memcpy+free when in
PyObject_Realloc(p, n) p was obtained from the system malloc or
realloc but n is small enough to meet the "small object" threshold
(pymalloc "takes over" small blocks that result from a
PyObject_Realloc()).  That's a reasonable strategy *because* n is
always small in such cases.  If you're going to extend this strategy
to n of arbitrary size, then you may also create new performance
problems for some apps on Darwin (copying n bytes can get arbitrarily
expensive).

> ...
>  I'm sure I'll find something, but what's important to me is that Python
> works well on Mac OS X, so something should happen.

I agree the socket-abuse case should be fiddled, and for more reasons
than just Darwin's realloc() quirks.  I don't know that there are
actual problems on Darwin broader than that case (and I'm not
challenging you to contrive one, I'm asking whether realloc() quirks
are suspected in any other case that's known).  Part of what you
demonstrated when you said that pystone didn't slow down when you
fiddled stuff is that pystone also didn't speed up.  I also don't know
that the memcpy+free wormaround is actually going to help more than it
hurts overall.  Yes, in the socket-abuse case, where the program
routinely malloc()s strings millions of bytes larger than the socket
can deliver, it would obviously help.  That's not typically program
behavior (however typical it may be of that specific app).  More
typical is shrinking a long list one element at a time, in which case
about half the list remaining would get memcpy'd from time to time
where such copies never get made today.

IOW, there's no straightforward pure win here.


More information about the Python-Dev mailing list