[Python-Dev] Memory size overflows

Mon, 14 Oct 2002 21:40:34 -0400

[Armin Rigo]
> Let me quote from stringobject.c the "heartbreaking code such as
> string_repeat's":
>
>         nbytes = size * sizeof(char);
>         if (nbytes / sizeof(char) != (size_t)size ||
>             nbytes + sizeof(PyStringObject) <= nbytes)
>                 <OverflowError>
>         op = (PyStringObject *)
>                 PyObject_MALLOC(sizeof(PyStringObject) + nbytes);
>         if (op == NULL)
>                 <MemoryError>
>
> With the proper macro, the whole code above can be written as:
>
>         nbytes = Py_MUL_ADD(size, sizeof(char), sizeof(PyStringObject));
>         op = (PyStringObject *) PyObject_MALLOC(nbytes);
>         if (op == NULL)
>                 <MemoryError>
>
> which seems much cleaner to me.

It's not the appearance of the code that breaks my heart (although it
doesn't cheer it either <wink>), it's that when we "fix" one of these things
that pops up in real life, we add a pile of expensive tests and branches to
code that's been known to fall over only once in 10 years.  So Python gets
bigger and slower for everyone, and half the time there's no real cause
beyond that some wiener is playing "let's try to break it!".

> For this reason I do not believe it is a good idea here to define a
> macro a la Zope C:
>
>     DO_MULADD_OR(nbytes, size, sizeof(char), sizeof(PyStringObject),
> goto Overflow);
>
> for two reasons: there is no real need for a special "Overflow" case,

I don't know.  It depends on context, such as how many times a potential
overflow can occur within a single routine.  Probably not often on average.

> as long as the returned 'nbytes' is guaranteed to be un-malloc-able;

There is no nbytes that's guaranteed to bu un-malloc-able, unless you're
going to impose artificial restrictions on malloc.  It's not a happy thing
to burden our malloc wrappers with another layer of test-and-branch either.

> besides, no existing macro (e.g. _PyObject_VAR_SIZE) can support an
> extra "goto Overflow" case without breaking C code that depends on it.

Cool:  yet another reason to leave things alone <wink>.

> There is however a difference between the two above code fragments, in
> that the former will raise an OverflowError for "really too large
> values" and a MemoryError for "reasonable value but still too large for
> me to allocate right now".  Whether this is a correct behavior is
> debatable.

Some things aren't worth debating, though -- it doesn't matter to me which
exception we raise.

> I would argue that raising MemoryError in both cases is fine:

Fine by me too.

> after all, "123"*9999999999999999 could be perfectly fine in another
> Python implementation with lazy strings.  It is thus only a
> memory limit that makes this fail in the current implementation.
>
> Note that if we define MAX_SIZE to ((size_t)-1), the overflow check in
> the present stringobject.c could be more efficiently written down as:

(size_t)~0UL is safer, although there's really no wholly safe way to spell
the intent in C89.

>         if (size > (MAX_SIZE - sizeof(PyStringObject)) / sizeof(char))
>                 <OverflowError>
>
> where everything at the right of '>' reduces to a compile-time
> constant.  The Py_MUL_ADD macro can be implemented with this idea in
> mind, which works best if the 2nd and 3rd arguments are constants.

>> note that anything/non_constant can be incredibly expensive!

> Yes.  Couldn't a better overflow-checking algorithm be designed for
> these cases?  For example, to multiply the non-constants 'a' and 'b', we
> can be sure that there is no overflow if both 'a' and 'b' are small
> enough (which is the common case anyway).  I believe I can even think
> about a single Py_MUL_ADD macro that is as efficient as above with
> constant arguments, and otherwise uses this trick.

And more tests, and more branches.  Life will be much easier if we can limit
it to size_t thingies (which have the same size, and are unsigned0.

> About using 64-bit computations everywhere (as suggested by Christian):
> if the length of objects (e.g. lists) is still constrained to 31 bits,
> then it's fine; otherwise, the problem is the same.  Do all supported
> 64-bit platforms have a 32-bit 'int', and thus a 31-bit limit on object
> lengths?

No.  For example, some now-rare models of Cray boxes have no 32-bit types.
If 64-bit machines really are the wave of the future (I started using them
in the late 1970's, and am still waiting for the revolution <0.9 wink>),
it's no cure anyway, as Python in a truly 64-bit world will have to lose its
2GB-bounded of the world (as it's already painfully doing for files, and
maybe for mmaps).  Then the "detect overflow" problem returns with a
vengeance.

> By the way, this makes me think of a good efficient overflow-detection
> method for platforms with a "long long" type longer than size_t:
>
>   unsigned long long bytes = ((unsigned long long)a)*((unsigned long
> long)b)+c;
>   if (bytes > ((size_t)-1))
>     <overflow>;

This also implicitly assumes that:

1. size_t is no wider than an int; that's the kind of thing the
   "(size_t)~0UL would be safer" comment above was getting at.

2. long long is at least twice as wide as size_t.

> This compiles to very good assembly code on x86.
>
> Mmmh, in summary I would recommend that we define a common
> "multiply-and-add" macro, returning a size_t or OVERFLOW_VALUE in case
> of overflow, which is implemented in several different ways on different
> platforms, as selected by #if's.  The same macro is used for just
> "multiply" or just "add" by relying on the compiler's strengh reduction.

What a PITA.  Is it worth it?  If I don't have to do it, maybe <wink>.

BTW, when we allocate an object that participates in cyclic GC, there's Yet
Another little integer that gets added to the total size (to reflect the
"extra header" bytes unique to GC'ed objects), and so there's Yet Another
place potential overflow may go uncaught now.