Re: [SciPy-dev] FFTW performances in scipy and numpy
On Thu, 2 Aug 2007, David Cournapeau wrote:
This is the way matlab works, right ? If I understand correctly, wisdoms are a way to compute plans "offline". So for example, if you compute plans with FFTW_MEASURE | FFTW_UNALIGNED, for inplace transforms and a set of sizes, you can record it in a wisdom, and reload it such as later calls with FFTW_MEASURE | FFTW_UNALIGNED will be fast ?
Yes, although a wisdom file can contain as many saved plans as you want.
Anyway, all this sounds like it should be solved by adding a better infrastructure the current wrappers (ala matlab).
I know a little about how the Matlab usage of FFTW works, and they are definitely not getting the full performance you would get by calling FFTW yourself from C etc. So they are not necessarily the gold standard. I certainly do not consider matlab as the gold standard. That's more a reason why the current situation to have worse performances than matlab for fft in scipy with fftw3 is not good. But I will work on a better cache mechanism with a user interface in python (the complex implementation of fft with fftw3 does not use copy anymore for a few days now, by the way). If you malloc and then immediately free, most of the time the malloc implementation is just going to re-use the same memory and so you will get the same pointer over and over. So it's not a good test of malloc alignment. Ah, this was stupid indeed. I should have checked the addresses returned by malloc. But then, playing a bit with the test program, I found that using size above ~ 17000 double starts to make the ratio of aligned data decreasing. This decreasing does not happen if I force malloc to use srbk and not mmap for big sizes: this is consistent with the fact the
Steven G. Johnson wrote: threshold for 32 bits for mmapping areas is 128 kb in gnu libc. Basically, areas which are allocated through mmap seem to be never 16 bytes aligned ! This is starting to go way beyond my knowledge... I thought mmap were page aligned, which means 16 bytes alignment. Maybe malloc does not return the pointer it got from mmap directly, but a shifted version for some reasons ? Maybe my test is flawed again ? I pasted it just below. For example, if you have N = 65384, the ratio is more about 10 % than 50 %; if you force not using mmap (M_MMAP_MAX = 0), then it goes back to ~50 %. regards, David --------------------------- #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <time.h> #include <malloc.h> #define NARRAY 10000 #define N (1 << 16) int main(void) { void *a[NARRAY]; uintptr_t p; int i, nalign = 0; int st; /* default value, at least on my 32 bits ubuntu with glibc: 128 kb */ st = mallopt(M_MMAP_THRESHOLD, 128 * 1024); if (st == 0) { fprintf(stderr, "changing malloc option failed\n"); } st = mallopt(M_MMAP_MAX, NARRAY); if (st == 0) { fprintf(stderr, "changing malloc option failed\n"); } srand(time(NULL)); for (i = 0; i < NARRAY; ++i) { a[i] = malloc((rand() % N + 2) * sizeof(double)); p = (uintptr_t) a[i]; if (p % 16 == 0) ++nalign; } printf("%d/%d = %g%% are 16-byte aligned\n", nalign, NARRAY, nalign * 100.0/NARRAY); for (i = 0; i < NARRAY; ++i) free(a[i]); return 0; }
participants (1)
-
David Cournapeau