[SciPy-User] [ANN] pyfftw-0.2 released

Mon Feb 15 06:17:45 EST 2010

Sebastian Haase wrote:
> On Mon, Feb 15, 2010 at 11:46 AM, David Cournapeau
> <david at silveregg.co.jp> wrote:
>> Sebastian Haase wrote:
>>
>>> Has this changed from FFTW2 to FFTW3 ?
>>> It would really limit the use of plans, and make overall FFTs much
>>> slower. In my specific case I very often have 512x512 single-precision
>>> real arrays (images), that I would do ffts over and over again. But
>>> the pointers would change ....
>> You can, but you need to use the advanced plan API, or use the recently
>> added new-array execute function:
>>
>> http://www.fftw.org/fftw3_doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
>>
> so it sounds like the alignment is the "killer" argument for the whole idea:
> quote

Well, yes, you need aligned pointers, there is no way around it if you 
want to (significantly) benefit from SSE - that's why I proposed some 
time ago now an aligned allocator to be used inside NumPy, so that many 
numpy arrays would be aligned by default.

Note that you can align them by yourself if you want to (there are 
several recipes on how to do that, one from Travis on Enthought blog 
IIRC, and one from Anne in the NumPy ML). Or explicitly create plans for 
unaligned arrays (this is significantly slower, though, but should be at 
least as fast as fftw2).

Also, most arrays allocated by malloc are *not* 16 bytes aligned on 
Linux, because for allocated areas above a certain size, the glibc 
malloc use mmap, and always "disalign" the allocated buffer. The 
threshold is easily reached when working with big data.

> I guess this is really all new with version 3 of FFTW. I hope that
> "reating a new plan is quick once one exists for a given size" means
> "neglectable" for 512x512 arrays !?

You would have to test, but IIRC, the cost is not negligeable. Creating 
an API around those plans should not be very difficult - at worse, you 
can take a look at how scipy used to do it when scipy was supporting 
FFTW backend. The problem is designing a fast API - especially for small 
size arrays (~ 2**10), fft is so fast that you cannot afford a lot while 
looking for cached plans :)

cheers,

David