[SciPy-User] [ANN] pyfftw-0.2 released

Jochen Schroeder cycomanic at gmail.com
Mon Feb 15 17:23:52 EST 2010

On 02/15/10 20:17, David Cournapeau wrote:
> Sebastian Haase wrote:
> > On Mon, Feb 15, 2010 at 11:46 AM, David Cournapeau
> > <david at silveregg.co.jp> wrote:
> >> Sebastian Haase wrote:
> >>
> >>> Has this changed from FFTW2 to FFTW3 ?
> >>> It would really limit the use of plans, and make overall FFTs much
> >>> slower. In my specific case I very often have 512x512 single-precision
> >>> real arrays (images), that I would do ffts over and over again. But
> >>> the pointers would change ....
> >> You can, but you need to use the advanced plan API, or use the recently
> >> added new-array execute function:
> >>
> >> http://www.fftw.org/fftw3_doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
> >>
> > so it sounds like the alignment is the "killer" argument for the whole idea:
> > quote
> Well, yes, you need aligned pointers, there is no way around it if you 
> want to (significantly) benefit from SSE - that's why I proposed some 
> time ago now an aligned allocator to be used inside NumPy, so that many 
> numpy arrays would be aligned by default.
I still think this is a very good idea. What were the main objections around
this at the time?

> Note that you can align them by yourself if you want to (there are 
> several recipes on how to do that, one from Travis on Enthought blog 
> IIRC, and one from Anne in the NumPy ML). Or explicitly create plans for 
> unaligned arrays (this is significantly slower, though, but should be at 
> least as fast as fftw2).

There is a function in pyfftw to create aligned arrays, and it does cause a
significant performance benefit to use aligned arrays.

> Also, most arrays allocated by malloc are *not* 16 bytes aligned on 
> Linux, because for allocated areas above a certain size, the glibc 
> malloc use mmap, and always "disalign" the allocated buffer. The 
> threshold is easily reached when working with big data.

Just to clarify this is 32bit Linux, on 64bit malloc automatically aligns to

> > I guess this is really all new with version 3 of FFTW. I hope that
> > "reating a new plan is quick once one exists for a given size" means
> > "neglectable" for 512x512 arrays !?
> You would have to test, but IIRC, the cost is not negligeable. Creating 
> an API around those plans should not be very difficult - at worse, you 
> can take a look at how scipy used to do it when scipy was supporting 
> FFTW backend. The problem is designing a fast API - especially for small 
> size arrays (~ 2**10), fft is so fast that you cannot afford a lot while 
> looking for cached plans :)

Well I looked at creating a more "traditional" API around fftw (something like
y=fft(x)) but the performance benefit for relatively small arrays (in my
experience ~2**12) was mainly eaten up by the creation of the output array.
Because most of the stuff I do uses arrays of around that size and does a lot of ffts
back and forth between two arrays (pulse propagation simulations if anyone is
interested), I went with the current approach (it's probably possible to create
some sort of memory pool to avoid the time of allocating arrays, is there
something like this in numpy already?) 


> cheers,
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-User mailing list