[Python-Dev] pymalloc killer
Tim Peters
tim.one@comcast.net
Fri, 29 Mar 2002 05:32:32 -0500
[Tim]
> Recall that pymalloc delegates requests for "not small" blocks to
> the system malloc. This means that when pymalloc's free() is called,
> it has to figure out whether the address passed to it was one it
> allocated itself, or came from the system malloc.
> ...
> However, the question remained whether a hostile user could study
> the source code to mount a successful attack. Alas, it turns out
> that's easy.
There may be hope for this, and another strong reason for pursuing it.
Guido pointed out that *if* pymalloc's free() could know for sure whether an
address passed to it did or didn't come from the system malloc, then we
could also (probably -- needs some more thought) continue to use the
PyObject interface, *without* breaking most "illegal by fiat" old code, via
also mapping PyMem_{Free, FREE, Del, DEL} to the pymalloc free (when
pymalloc is enabled): then the various PyMem_NukeIt spellings would *know*
whether to pass a piece of memory on to the system free().
So I have a variant of pymalloc that doesn't use magic cookies -- it knows
"for sure", by maintaining a sorted (by address) contiguous vector of the
base addresses of the (256KB each) "arenas" allocated by pymalloc. A given
address A either does or doesn't satisfy
B <= A < B+256KB
for some base address B in this list. At worst, a binary search is needed
to see whether any such B exists. For a typical 2**31 byte VM address
space, we can get at most 2**31/2**18 == 2**13 arenas, so at worst this
vector can grow to a measly 8K entries (typically far fewer), and 13 is a
reasonable upper bound on the number of compares needed.
So it can't be a disaster, but neither is it dirt cheap. A promising
candidate for saving grace: in tests so far, frees tend to hit the same
arena many times in a row. By saving a search finger into the vector that
remembers the last hit index, and checking that location first, I'm seeing
hit rates over 85%. The two-sided range test at the finger index can be
inlined, and the two compares it requires are exactly as many compares as
the current magic-cookie tests do. So, most of the time, it may run as fast
as now.
For an address pymalloc *doesn't* handle on its own, though, a full binary
search is required to discover that it's not a pymalloc address. Leaving at
least one magic-cookie gimmick in could slash that (if the magic cookie
isn't there, we know for sure it's not a pymalloc address). So the test
becomes something like
def PyMalloc_Free(thisaddr):
# ... weed out NULL ...
# not NULL
if (addrs[finger] <= thisaddr < addrs[finger] + 256KB
or (thisaddr_has_magic_cookie and binary_search_for(thisaddr)):
it's a pymalloc addr
else
it goes to the system free()
That gives a likely fast-path for each class of address, and should never
err (provided we're passed legitimate addresses, and user code doesn't write
out of bounds, etc).
A nasty subtlety: if PyMem_NukeIt is called when the GIL isn't held, it's
going to take a lot of thought to see whether this can be made safe when
called simultaneously by multiple threads; e.g., the finger can change
"mid-stream" then, and, worse, some thread that *does* hold the GIL may
legitimately cause a new arena to be allocated midstream, mutating the
address vector during the search. This may be intractable enough to kill
the idea of mapping PyMem_XXX to this.
A nasty trick: due to the wonders of unsigned arithmetic, the two-sided
range test can be reduced to one compare:
(Py_uintptr)thisaddr - (Py_uintptr)addrs[finger] < 256KB
In other news, "it's obvious" that the small object threshhold is too small
for 2.3. I'm still inclined to max it out.