[SciPy-User] [ANN] pyfftw-0.2 released

Mon Feb 15 17:23:52 EST 2010

On 02/15/10 20:17, David Cournapeau wrote:
> Sebastian Haase wrote:
> > On Mon, Feb 15, 2010 at 11:46 AM, David Cournapeau
> > <david at silveregg.co.jp> wrote:
> >> Sebastian Haase wrote:
> >>
> >>> Has this changed from FFTW2 to FFTW3 ?
> >>> It would really limit the use of plans, and make overall FFTs much
> >>> slower. In my specific case I very often have 512x512 single-precision
> >>> real arrays (images), that I would do ffts over and over again. But
> >>> the pointers would change ....
> >> You can, but you need to use the advanced plan API, or use the recently
> >> added new-array execute function:
> >>
> >> http://www.fftw.org/fftw3_doc/New_002darray-Execute-Functions.html#New_002darray-Execute-Functions
> >>
> > so it sounds like the alignment is the "killer" argument for the whole idea:
> > quote
> 
> Well, yes, you need aligned pointers, there is no way around it if you 
> want to (significantly) benefit from SSE - that's why I proposed some 
> time ago now an aligned allocator to be used inside NumPy, so that many 
> numpy arrays would be aligned by default.
> 
I still think this is a very good idea. What were the main objections around
this at the time?

> Note that you can align them by yourself if you want to (there are 
> several recipes on how to do that, one from Travis on Enthought blog 
> IIRC, and one from Anne in the NumPy ML). Or explicitly create plans for 
> unaligned arrays (this is significantly slower, though, but should be at 
> least as fast as fftw2).

There is a function in pyfftw to create aligned arrays, and it does cause a
significant performance benefit to use aligned arrays.

> 
> Also, most arrays allocated by malloc are *not* 16 bytes aligned on 
> Linux, because for allocated areas above a certain size, the glibc 
> malloc use mmap, and always "disalign" the allocated buffer. The 
> threshold is easily reached when working with big data.

Just to clarify this is 32bit Linux, on 64bit malloc automatically aligns to
16bytes.

> 
> > I guess this is really all new with version 3 of FFTW. I hope that
> > "reating a new plan is quick once one exists for a given size" means
> > "neglectable" for 512x512 arrays !?
> 
> You would have to test, but IIRC, the cost is not negligeable. Creating 
> an API around those plans should not be very difficult - at worse, you 
> can take a look at how scipy used to do it when scipy was supporting 
> FFTW backend. The problem is designing a fast API - especially for small 
> size arrays (~ 2**10), fft is so fast that you cannot afford a lot while 
> looking for cached plans :)

Well I looked at creating a more "traditional" API around fftw (something like
y=fft(x)) but the performance benefit for relatively small arrays (in my
experience ~2**12) was mainly eaten up by the creation of the output array.
Because most of the stuff I do uses arrays of around that size and does a lot of ffts
back and forth between two arrays (pulse propagation simulations if anyone is
interested), I went with the current approach (it's probably possible to create
some sort of memory pool to avoid the time of allocating arrays, is there
something like this in numpy already?) 

Cheers
Jochen

> 
> cheers,
> 
> David
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user