[Python-Dev] Invalid memory read in PyObject_Free
Tim Peters
tim.one@comcast.net
Fri, 4 Jul 2003 16:25:10 -0400
[Guido]
> Hm... Do you have a suggestion for making the code more ANSI
> conformant?
Note that it's not possible to write a useful and portable memory allocator
in 100% standard ANSI C. This was true even under K&R C's fuzzy rules
interpreted in favor of cowboys (see K&R's discussion of malloc in "The C
Programming Language"). pymalloc's abuses are mild, as such things go.
> Surely checking whether an address is within a certain range shouldn't
> require accessing any out-of-bounds memory?
"range" doesn't apply, but "ranges" does, as each pymalloc arena is a
distinct chunk of memory. The brilliance of the current scheme (whose core
trick is due to David Abrahams) is that it delivers a correct result
quickly, in constant time, with few branches, in time independent of how
many arenas pymalloc has allocated.
> Or am I mistaken about how the offending piece of code works?
pymalloc assumes that it can (a) losslessly cast a valid address to *some*
unsigned integral type; and, (b) also obtain a valid address (in the sense
that dereferencing won't blow up) by clearing some number of the low-order
bits. Standard C doesn't guarantee that either can be done. There's also a
vague assumption that the HW is byte-addressed, although that one has more
to do with memory efficiency than with correctness.
Vladimir's original code made the same assumptions, but didn't guarantee to
deliver a correct result in all cases (and I posted all-Python code at the
time that could fool it, triggering problems from overwriting byte code to
segfaults).
An intermediate version of the code delivered correct results and avoided
assumption #b, at the cost of doing a binary search over a sorted list of
arena base addresses. It was much slower than the current code, and didn't
avoid assumption #a (that lossless casting to some unsigned integral type is
possible). Other intermediate versions avoiding #b used binary search with
a search finger, and a splay tree, to try to speed searches. They were also
much slower -- and much more complicated.
We get huge value out of pymalloc on the major HW architectures still
surviving, and that's worth enduring some pain.