segfault calling SSE enabled library from ctypes
Olivier Grisel
olivier.grisel at ensta.org
Tue Nov 25 18:04:55 EST 2008
Replying to myself:
haypo found the origin of the problem. Apparently this problem stems
from a GCC bug [1] (that should be fixed on x86 as of version 4.4).
The bug is that GCC does not always ensure the stack to be 16 bytes
aligned hence the "__m128 myvector" local variable in the previous
code might not be aligned. A workaround would be to align the stack
before calling the inner function as done here:
http://www.bitbucket.org/ogrisel/ctypes_sse/changeset/dc27626824b8/
New version of the previous C code:
<quote>
#include <stdio.h>
#include <emmintrin.h>
void wrapped_dummy_sse()
{
// allocate an alligned vector of 128 bits
__m128 myvector;
printf("[dummy_sse] before calling setzero\n");
fflush(stdout);
// initialize it to 4 32 bits float valued to zeros
myvector = _mm_setzero_ps();
printf("[dummysse] after calling setzero\n");
fflush(stdout);
// display the content of the vector
float* part = (float*) &myvector;
printf("[dummysse] myvector = {%f, %f, %f, %f}\n",
part[0], part[1], part[2], part[3]);
}
void dummy_sse(void)
{
(void)__builtin_return_address(1); // to force call frame
asm volatile ("andl $-16, %%esp" ::: "%esp");
wrapped_dummy_sse();
}
int main()
{
dummy_sse();
return 0;
}
</quote>
[1] see e.g. for a nice summary of the issue
http://www.mail-archive.com/gcc%40gcc.gnu.org/msg33101.html
Another workaround would be to allocate myvector in the heap using
malloc / posix_memalign for instance.
Best,
--
Olivie
More information about the Python-list
mailing list