[SciPy-User] [ANN] pyfftw-0.2 released
David Cournapeau
david at silveregg.co.jp
Mon Feb 15 06:17:45 EST 2010
Sebastian Haase wrote:
> On Mon, Feb 15, 2010 at 11:46 AM, David Cournapeau
> <david at silveregg.co.jp> wrote:
>> Sebastian Haase wrote:
>>
>>> Has this changed from FFTW2 to FFTW3 ?
>>> It would really limit the use of plans, and make overall FFTs much
>>> slower. In my specific case I very often have 512x512 single-precision
>>> real arrays (images), that I would do ffts over and over again. But
>>> the pointers would change ....
>> You can, but you need to use the advanced plan API, or use the recently
>> added new-array execute function:
>>
>> http://www.fftw.org/fftw3_doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
>>
> so it sounds like the alignment is the "killer" argument for the whole idea:
> quote
Well, yes, you need aligned pointers, there is no way around it if you
want to (significantly) benefit from SSE - that's why I proposed some
time ago now an aligned allocator to be used inside NumPy, so that many
numpy arrays would be aligned by default.
Note that you can align them by yourself if you want to (there are
several recipes on how to do that, one from Travis on Enthought blog
IIRC, and one from Anne in the NumPy ML). Or explicitly create plans for
unaligned arrays (this is significantly slower, though, but should be at
least as fast as fftw2).
Also, most arrays allocated by malloc are *not* 16 bytes aligned on
Linux, because for allocated areas above a certain size, the glibc
malloc use mmap, and always "disalign" the allocated buffer. The
threshold is easily reached when working with big data.
> I guess this is really all new with version 3 of FFTW. I hope that
> "reating a new plan is quick once one exists for a given size" means
> "neglectable" for 512x512 arrays !?
You would have to test, but IIRC, the cost is not negligeable. Creating
an API around those plans should not be very difficult - at worse, you
can take a look at how scipy used to do it when scipy was supporting
FFTW backend. The problem is designing a fast API - especially for small
size arrays (~ 2**10), fft is so fast that you cannot afford a lot while
looking for cached plans :)
cheers,
David
More information about the SciPy-User
mailing list