[Python-Dev] Memory size overflows

Armin Rigo arigo@ulb.ac.be
Sat, 12 Oct 2002 19:45:02 +0200

Hello everybody,

All around the C code there are potential problems with objects of very
large sizes (http://www.python.org/sf/618623).  The problem is that to
allocate a variable-sized object of type 't' with 'n' elements we
compute 'n*t->tp_itemsize', which can overflow even if 'n' is a
perfectly legal value.  If the truncated result is small, the subsequent
malloc() suceeds, and we run into a segfault by trying to access more
memory than reserved.  The same problem exists at other places -- more
or less everywhere we add or multiply something to a number that could
be user-supplied.  For example, Guido just fixed '%2147483647d'%-123.  A
rather artificial example, I agree, but a hole anyway.

To fix this I suggest introducing a few new macros in pymem.h that
compute things about sizes with overflow checking.  I can see a couple
of approaches based on special values that mean "overflow":

1) there is just one special "overflow" value, e.g.
((size_t)-1), that is returned and propagated by the macros
when an overflow is detected.  This might be error-prone
because if we forget once to use the macros to add a few
bytes to the size, this special value will wrap down to a
small legal value -- and segfault.

2) same as above, but with a whole range of overflow
values.  For example, just assume (or decide) that no malloc
of more than half the maximum number that fits into a size_t
can succeed.  We don't need any macro to add a (resonable)
constant to a size.  We need a macro for multiplication that
-- upon overflow -- returns the first number of the "overflow"
range.  The Add macro is still needed to sum *two* potentially
large numbers.

3) we compute all sizes with signed integers (int or long),
as is currently (erroneously?) done at many places.  Any
negative integer is regarded as "overflow", but the
multiplication macro returns the largest negative integer in
case of overflow, so that as above no addition macro is
needed for the simple cases.

This will require a "multiplication hunt party" :-)

Also, approaches 2 and 3 require fixes to ensure that
'malloc(any-overflow-size)' always fail, for any of the several
implementations of malloc found in the code.  Even with
approach 1, I would not trust the platform malloc to correctly
fail on malloc(-1) -- I guess it might "round up" the value to
0 before it proceed...