curious FFFT timing
I have an application that needs to perform close 1/2 million FFT's (actually inverse FFTs) I was stymied by the time it was taking so I created a simple benchmark program to measure raw fft speed  hoping to extrapolate and see if the FFT algorithm was the time consuming portion of the algorithm. The result I obtained surprised me Inverse FFT ( 4096) Number of seconds 5.608000 Time per FFT 0.005608 Forward FFT ( 4096) Number of seconds 3.164000 Time per FFT 0.003164 Inverse FFT ( 4095) Number of seconds 8.222000 Time per FFT 0.008222 Forward FFT ( 4095) Number of seconds 5.468000 Time per FFT 0.005468 Inverse FFT ( 8192) Number of seconds 12.578000 Time per FFT 0.012578 Forward FFT ( 8192) Number of seconds 7.551000 Time per FFT 0.007551 Inverse FFTs were taking almost twice as long as forward FFTs ! Bottom line I found that in the inverse code the multiplication of 1/N at the end was the culprit. When I removed the /n from the line return _raw_fft(a, n, axis, fftpack.cffti, fftpack.cfftb, _fft_cache)/n the code timing was more along the line of what was expected. Any one want to speculate on the timing difference for the 4095 vice 4096 long transforms ? Since 4095 is not a power of two I would have expected a greater time difference (DFT vice FFT) N*N / N*log2(N) = N/log2(N) = 4096/12 = 341.33 which is >> than the 1.72 seen above Cheers Eric  Code Snippet  import time from Numeric import * from FFT import * def f_fft(number,length): a = arange(float(length)) start_time = time.time() for i in xrange(number): b = fft(a) durat = time.time()  start_time durat_per = durat / number print "Forward FFT (%10d) Number of seconds %12.6f Time per FFT %12.6f" % (length,durat,durat_per) def i_fft(number,length): a = arange(float(length)) start_time = time.time() for i in xrange(number): b = inverse_fft(a) durat = time.time()  start_time durat_per = durat / number print "Inverse FFT (%10d) Number of seconds %12.6f Time per FFT %12.6f" % (length,durat,durat_per) i_fft(1000,4096) f_fft(1000,4096) i_fft(1000,4095) f_fft(1000,4095) i_fft(1000,8192) f_fft(1000,8192)
Hi Eric,
Any one want to speculate on the timing difference for the 4095 vice 4096 long transforms ? Since 4095 is not a power of two I would have expected a greater time difference (DFT vice FFT)
You are correct in that 4095 is not a power of two, but is is the product of only small prime factors: 3 * 3 * 5 * 7 * 13 = 4095 Since the FFT code implements a N*log_2(N) algorithm for numbers that contain only small prime factors, the difference in times is rather small. FFTs that have lengths that are powersoftwo tend to be more efficient in general (since the decimation routines are cleaner for this case). If you test with 4097 (17 * 241) it will be quite a bit slower I'd guess... Scott  Scott M. Ransom Address: HarvardSmithsonian CfA Phone: (617) 4954142 60 Garden St. MS 10 email: ransom@cfa.harvard.edu Cambridge, MA 02138 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Scott, Right you are. My brain fried  I forgot about the other "small prime" stuff Inverse FFT ( 4096) Number of seconds 5.508000 Time per FFT 0.005508 Forward FFT ( 4096) Number of seconds 3.225000 Time per FFT 0.003225 Inverse FFT ( 4097) Number of seconds 29.723000 Time per FFT 0.029723 Forward FFT ( 4097) Number of seconds 26.788000 Time per FFT 0.026788 Cheers Eric  Original Message  From: "Scott Ransom" <ransom@cfa.harvard.edu> To: "Eric Hagemann" <ehagemann@home.com> Cc: <Numpydiscussion@lists.sourceforge.net> Sent: Tuesday, April 03, 2001 9:05 PM Subject: Re: [Numpydiscussion] curious FFFT timing
Hi Eric,
Any one want to speculate on the timing difference for the 4095 vice 4096 long transforms ? Since 4095 is not a power of two I would have expected a greater time difference (DFT vice FFT)
You are correct in that 4095 is not a power of two, but is is the product of only small prime factors: 3 * 3 * 5 * 7 * 13 = 4095
Since the FFT code implements a N*log_2(N) algorithm for numbers that contain only small prime factors, the difference in times is rather small. FFTs that have lengths that are powersoftwo tend to be more efficient in general (since the decimation routines are cleaner for this case).
If you test with 4097 (17 * 241) it will be quite a bit slower I'd guess...
Scott
 Scott M. Ransom Address: HarvardSmithsonian CfA Phone: (617) 4954142 60 Garden St. MS 10 email: ransom@cfa.harvard.edu Cambridge, MA 02138 GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
_______________________________________________ Numpydiscussion mailing list Numpydiscussion@lists.sourceforge.net http://lists.sourceforge.net/lists/listinfo/numpydiscussion
participants (2)

Eric Hagemann

Scott Ransom