[Python-Dev] Invalid memory read in PyObject_Free

Tim Peters tim.one@comcast.net
Fri, 4 Jul 2003 16:25:10 -0400


[Guido]
> Hm...  Do you have a suggestion for making the code more ANSI
> conformant?

Note that it's not possible to write a useful and portable memory allocator
in 100% standard ANSI C.  This was true even under K&R C's fuzzy rules
interpreted in favor of cowboys (see K&R's discussion of malloc in "The C
Programming Language").  pymalloc's abuses are mild, as such things go.

> Surely checking whether an address is within a certain range shouldn't
> require accessing any out-of-bounds memory?

"range" doesn't apply, but "ranges" does, as each pymalloc arena is a
distinct chunk of memory.  The brilliance of the current scheme (whose core
trick is due to David Abrahams) is that it delivers a correct result
quickly, in constant time, with few branches, in time independent of how
many arenas pymalloc has allocated.

> Or am I mistaken about how the offending piece of code works?

pymalloc assumes that it can (a) losslessly cast a valid address to *some*
unsigned integral type; and, (b) also obtain a valid address (in the sense
that dereferencing won't blow up) by clearing some number of the low-order
bits.  Standard C doesn't guarantee that either can be done.  There's also a
vague assumption that the HW is byte-addressed, although that one has more
to do with memory efficiency than with correctness.

Vladimir's original code made the same assumptions, but didn't guarantee to
deliver a correct result in all cases (and I posted all-Python code at the
time that could fool it, triggering problems from overwriting byte code to
segfaults).

An intermediate version of the code delivered correct results and avoided
assumption #b, at the cost of doing a binary search over a sorted list of
arena base addresses.  It was much slower than the current code, and didn't
avoid assumption #a (that lossless casting to some unsigned integral type is
possible).  Other intermediate versions avoiding #b used binary search with
a search finger, and a splay tree, to try to speed searches.  They were also
much slower -- and much more complicated.

We get huge value out of pymalloc on the major HW architectures still
surviving, and that's worth enduring some pain.