gcc optimization breaks NumPy?

Kaz Kylheku kaz at ashi.FootPrints.net
Mon Aug 23 23:09:32 CEST 1999


On Mon, 23 Aug 1999 20:27:34 GMT, John Fisher <jfisher at are.berkeley.edu> wrote:
>Hey folks,
>
>I was getting consistent but inexplicable segmentation faults with the
>following code, using NumPy's C API to move data between Python and C
>(for sparse matrix multiplication; the data structure is from Meschach,
>in case that's relevant).  I found the problem to be caused by gcc's
>optimization options.
>
>static PyObject * mul(PyObject *self, PyObject *args) {
>
>  ...Snip declarations...
>
>  ...Snip working code.  Function ends with:
>
>  return MakeFromMes(product);
>}
>
>which is defined as:
>
>PyObject * MakeFromMes(SPMAT *in) {
>  int i, j, m, n, nzs, count, pos, therow, thecol;
>  int dummy[0];

Ouch. This is an ANSI C constraint violation that requires a diagnostic. ANSI C
does not support zero size arrays. This code is brain-damaged.

>  double elem;
>  PyArrayObject *pr, *ir, *jc;
>
>  m = in->m; n = in->n; nzs = 0;
>
>  /* Count the number of nonzero elements of in
>     ISZERO is a macro to test for "good enough" floating point 0 */
>  for(i = 0; i < m; i++) {
>    for(j = 0; j < n; j++) {
>      if(!ISZERO(sp_get_val(in, i, j))) nzs++;
>    }
>  }
>
>  dummy[0] = nzs;

Ouch. What does this mean? Assuming that the compiler accepts the
zero length array extension (which gcc does, by default) it's an access beyond
the ``end'' of the zero-element array.

The code seems to depend on the order of allocation of auto objects in the
stack frame and on some dubious aliasing between a pseudo array object and
other auto vars.

I'm afraid that someone will have to re-write this code in the C language,
or think of a more clever hack to achieve whatever aliasing trick this is
supposed to do in somewwhat more portable manner that is more resilient against
optimization, such as maybe using a union to overlap a non-zero length array
with the other values.

>The problem occurs only with an -O option in compilation.  Other
>specific -f optimization options do not cause the seg fault.  So my

It could be that optimization causes certain of the auto vars to not be given
actual storage, but to be placed in registers. Declaring these variables as
volatile will probably prevent GCC from applying these optimizations.
Certainly, it's probably safest to use volatile when doing obtuse aliasing like
this, in case the compiler doesn't realize what is going on.

>question is -- because I've let gcc optimize other extensions to Python
>I've written without incident -- what was it here that caused this
>behavior?  I'd appreciate any ideas and speculations, because I'm still
>baffled by this, though the problem seems to be solved.

The behavior of your code is undefined. This means that a conforming C
implementation can do anything it wants. For example, bring up a friendly game
of Tetris when the offending function is called. Or make demons fly out
of your nose. :)




More information about the Python-list mailing list