[Pythonmac-SIG] Re: [Python-Dev] Darwin's realloc(...)
implementation never shrinks allocations
Bob Ippolito
bob at redivi.com
Mon Jan 3 07:08:24 CET 2005
On Jan 3, 2005, at 12:13 AM, Tim Peters wrote:
> [Bob Ippolito]
>> Quite a few notable places in the Python sources expect realloc(...)
>> to
>> relinquish some memory if the requested size is smaller than the
>> currently allocated size.
>
> I don't know what "relinquish some memory" means. If it means
> something like "returns memory to the OS, so that the reported process
> size shrinks", then no, nothing in Python ever assumes that. That's
> simply because "returns memory to the OS" and "process size" aren't
> concepts in the C standard, and so nothing can be said about them in
> general -- not in theory, and neither in practice, because platforms
> (OS+libc combos) vary so widely in behavior here.
>
> As a pragmatic matter, I *expect* that a production-quality realloc()
> implementation will at least be able to reuse released memory,
> provided that the amount released is at least half the amount
> originally malloc()'ed (and, e.g., reasonable buddy systems may not be
> able to do better than that).
This is what I meant by relinquish (c/o merriam-webster):
a : to stop holding physically : RELEASE <slowly relinquished his
grip on the bar>
b : to give over possession or control of : YIELD <few leaders
willingly relinquish power>
Your expectation is not correct for Darwin's memory allocation scheme.
It seems that Darwin creates allocations of immutable size. The only
way ANY part of an allocation will ever be used by ANYTHING else is if
free() is called with that allocation. free() can be called either
explicitly, or implicitly by calling realloc() with a size larger than
the size of the allocation. In that case, it will create a new
allocation of at least the requested size, copy the contents of the
original allocation into the new allocation (probably with
copy-on-write pages if it's large enough, so it might be cheap), and
free() the allocation. In the case where realloc() specifies a size
that is not greater than the allocation's size, it will simply return
the given allocation and cause no side-effects whatsoever.
Was this a good decision? Probably not! However, it is our (in the "I
know you use Windows but I am not the only one that uses Mac OS X"
sense) problem so long as Darwin is a supported platform, because it is
highly unlikely that Apple will backport any "fix" to the allocator
unless we can prove it has some security implications in software
shipped with their OS. I attempted to look for some easy ones by
performing a quick audit of Apache, OpenSSH, and OpenSSL.
Unfortunately, their developers did not share your expectation. I
found one sprintf-like routine in Apache that could be affected by this
behavior, and one instance of immutable string creation in Apple's
CoreFoundation CFString implementation, but I have yet to find an easy
way to exploit this behavior from the outside. I should probably be
looking at PHP and Perl instead ;)
>> but I figure Darwin does this as an "optimization" and because Darwin
>> probably can't resize mmap'ed memory (at least it can't from Python,
>> but this probably means it doesn't have this capability at all).
>>
>> It is possible to "fix" this for Darwin,
>
> I don't understand what's "broken". Small objects go thru Python's
> own allocator, which has its own realloc policies and its own
> peculiarities (chiefly that pymalloc never free()s any memory
> allocated for small objects).
What's broken is that there are several places in Python that seem to
assume that you can allocate a large chunk of memory, and make it
smaller in some meaningful way with realloc(...). This is not true
with Darwin. You are right about small objects. They don't matter
because they're small, and because they're handled by Python's
allocator.
>> because you can ask the default malloc zone how big a particular
>> allocation is, and how big an allocation of a given size will actually
>> be (see: <malloc/malloc.h>).
>> The obvious place to put this would be PyObject_Realloc, because this
>> is at least called by _PyString_Resize (which will fix
>> <http://python.org/sf/1092502>).
>
> The diagnosis in the bug report seems to leave it pointing at
> socket.py's _fileobject.read(), although I suspect the real cause is
> in socketmodule.c's sock_recv(). We've had other reports of various
> problems when people pass absurdly large values to socket recv(). A
> better fix here would probably amount to rewriting sock_recv() to
> refuse to pass enormous numbers to the platform recv() (it appears
> that many platform recv() implementations simply don't expect a recv()
> argument to be much bigger than the native network buffer size, and
> screw up when that's not so).
You are correct. The real cause is in sock_recv(), and/or
_PyString_Resize(), depending on how you look at it.
>> Note that all versions of Darwin that I've looked at (6.x, 7.x, and
>> 8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have
>> this "issue", but it might go away by Mac OS X 10.4 or some later
>> release.
>
> It would be good to rewrite sock_recv() more defensively in any case.
> Best I can tell, this implementation of realloc() is
> standard-conforming but uniquely brain dead in its downsize behavior.
Presumably this can happen at other places (including third party
extensions), so a better place to do this might be _PyString_Resize().
list_resize() is another reasonable place to put this. I'm sure there
are other places that use realloc() too, and the majority of them do
this through obmalloc. So maybe instead of trying to track down all
the places where this can manifest, we should just "gunk up" Python and
patch PyObject_Realloc()? Since we are both pretty confident that
other allocators aren't like Darwin, this "gunk" can be #ifdef'ed to
the __APPLE__ case.
> I don't expect the latter will last (as you say on your page,
> "probably plenty of other software" also makes the same pragmatic
> assumptions about realloc downsize behavior), so I'm not keen to gunk
> up Python to worm around it.
As I said above, I haven't yet found any other software that makes the
same kind of realloc() assumptions that Python does. I'm sure I'll
find something, but what's important to me is that Python works well on
Mac OS X, so something should happen. If we can't prove that Apple's
allocation strategy is a security flaw in some service that ships with
the OS, any improvements to this strategy are very unlikely to be
backported to current versions of Mac OS X.
-bob
More information about the Pythonmac-SIG
mailing list