[Pythonmac-SIG] Re: [Python-Dev] Darwin's realloc(...) implementation never shrinks allocations

Bob Ippolito bob at redivi.com
Mon Jan 3 07:08:24 CET 2005


On Jan 3, 2005, at 12:13 AM, Tim Peters wrote:

> [Bob Ippolito]
>> Quite a few notable places in the Python sources expect realloc(...) 
>> to
>> relinquish some memory if the requested size is smaller than the
>> currently allocated size.
>
> I don't know what "relinquish some memory" means.  If it means
> something like "returns memory to the OS, so that the reported process
> size shrinks", then no, nothing in Python ever assumes that.  That's
> simply because "returns memory to the OS" and "process size" aren't
> concepts in the C standard, and so nothing can be said about them in
> general -- not in theory, and neither in practice, because platforms
> (OS+libc combos) vary so widely in behavior here.
>
> As a pragmatic matter, I *expect* that a production-quality realloc()
> implementation will at least be able to reuse released memory,
> provided that the amount released is at least half the amount
> originally malloc()'ed (and, e.g., reasonable buddy systems may not be
> able to do better than that).

This is what I meant by relinquish (c/o merriam-webster):
     a : to stop holding physically : RELEASE <slowly relinquished his 
grip on the bar>
     b : to give over possession or control of : YIELD <few leaders 
willingly relinquish power>

Your expectation is not correct for Darwin's memory allocation scheme.  
It seems that Darwin creates allocations of immutable size.  The only 
way ANY part of an allocation will ever be used by ANYTHING else is if 
free() is called with that allocation.  free() can be called either 
explicitly, or implicitly by calling realloc() with a size larger than 
the size of the allocation.  In that case, it will create a new 
allocation of at least the requested size, copy the contents of the 
original allocation into the new allocation (probably with 
copy-on-write pages if it's large enough, so it might be cheap), and 
free() the allocation.  In the case where realloc() specifies a size 
that is not greater than the allocation's size, it will simply return 
the given allocation and cause no side-effects whatsoever.

Was this a good decision?  Probably not!  However, it is our (in the "I 
know you use Windows but I am not the only one that uses Mac OS X" 
sense) problem so long as Darwin is a supported platform, because it is 
highly unlikely that Apple will backport any "fix" to the allocator 
unless we can prove it has some security implications in software 
shipped with their OS.  I attempted to look for some easy ones by 
performing a quick audit of Apache, OpenSSH, and OpenSSL.  
Unfortunately, their developers did not share your expectation.  I 
found one sprintf-like routine in Apache that could be affected by this 
behavior, and one instance of immutable string creation in Apple's 
CoreFoundation CFString implementation, but I have yet to find an easy 
way to exploit this behavior from the outside.  I should probably be 
looking at PHP and Perl instead ;)

>> but I figure Darwin does this as an "optimization" and because Darwin
>> probably can't resize mmap'ed memory (at least it can't from Python,
>> but this probably means it doesn't have this capability at all).
>>
>> It is possible to "fix" this for Darwin,
>
> I don't understand what's "broken".  Small objects go thru Python's
> own allocator, which has its own realloc policies and its own
> peculiarities (chiefly that pymalloc never free()s any memory
> allocated for small objects).

What's broken is that there are several places in Python that seem to 
assume that you can allocate a large chunk of memory, and make it 
smaller in some meaningful way with realloc(...).  This is not true 
with Darwin.  You are right about small objects.  They don't matter 
because they're small, and because they're handled by Python's 
allocator.

>> because you can ask the default malloc zone how big a particular
>> allocation is, and how big an allocation of a given size will actually
>> be (see: <malloc/malloc.h>).
>> The obvious place to put this would be PyObject_Realloc, because this
>> is at least called by _PyString_Resize (which will fix
>> <http://python.org/sf/1092502>).
>
> The diagnosis in the bug report seems to leave it pointing at
> socket.py's _fileobject.read(), although I suspect the real cause is
> in socketmodule.c's sock_recv().  We've had other reports of various
> problems when people pass absurdly large values to socket recv().  A
> better fix here would probably amount to rewriting sock_recv() to
> refuse to pass enormous numbers to the platform recv() (it appears
> that many platform recv() implementations simply don't expect a recv()
> argument to be much bigger than the native network buffer size, and
> screw up when that's not so).

You are correct.  The real cause is in sock_recv(), and/or 
_PyString_Resize(), depending on how you look at it.

>> Note that all versions of Darwin that I've looked at (6.x, 7.x, and
>> 8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have
>> this "issue", but it might go away by Mac OS X 10.4 or some later
>> release.
>
> It would be good to rewrite sock_recv() more defensively in any case.
> Best I can tell, this implementation of realloc() is
> standard-conforming but uniquely brain dead in its downsize behavior.

Presumably this can happen at other places (including third party 
extensions), so a better place to do this might be _PyString_Resize().  
list_resize() is another reasonable place to put this.  I'm sure there 
are other places that use realloc() too, and the majority of them do 
this through obmalloc.  So maybe instead of trying to track down all 
the places where this can manifest, we should just "gunk up" Python and 
patch PyObject_Realloc()?  Since we are both pretty confident that 
other allocators aren't like Darwin, this "gunk" can be #ifdef'ed to 
the __APPLE__ case.

> I don't expect the latter will last (as you say on your page,
> "probably plenty of other software" also makes the same pragmatic
> assumptions about realloc downsize behavior), so I'm not keen to gunk
> up Python to worm around it.

As I said above, I haven't yet found any other software that makes the 
same kind of realloc() assumptions that Python does.  I'm sure I'll 
find something, but what's important to me is that Python works well on 
Mac OS X, so something should happen.  If we can't prove that Apple's 
allocation strategy is a security flaw in some service that ships with 
the OS, any improvements to this strategy are very unlikely to be 
backported to current versions of Mac OS X.

-bob



More information about the Pythonmac-SIG mailing list