Hello list, I have a problem with fft speed. could someone try this : from scipy import * import time a = rand((186888));t1=time.time();fft(a);t2=time.time();t2-t1 # for me about 0.67791390419006348 s. on centrino 1.6GHz a = rand((186890));t1=time.time();fft(a);t2=time.time();t2-t1 # for me about 1.622053861618042 s. on centrino 1.6GHz a = rand((186889));t1=time.time();fft(a);t2=time.time();t2-t1 # the computation is infinit !!!! (no time to wait) Question : why is there a problem specialy for that size (186889) of fft ? It is not the only size with that problem. Thank Sam -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Samuel GARCIA wrote:
Hello list,
I have a problem with fft speed. could someone try this :
from scipy import * import time
a = rand((186888));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 0.67791390419006348 s. on centrino 1.6GHz
a = rand((186890));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 1.622053861618042 s. on centrino 1.6GHz
a = rand((186889));t1=time.time();fft(a);t2=time.time();t2-t1
# the computation is infinit !!!! (no time to wait)
Question : why is there a problem specialy for that size (186889) of fft ? It is not the only size with that problem.
Thank
Sam
What you can find with help (fft) is ... "This is most efficient for n a power of two" Nils
Yes I know this. But 186888 186889 and 186890 are not power of 2 and the computation time is very very different just for a difference of size of only one point. What is the reason ? And how to deal with that ? (I realy need to compute fft with a random nomber of point) Nils Wagner wrote:
Samuel GARCIA wrote:
Hello list,
I have a problem with fft speed. could someone try this :
from scipy import * import time
a = rand((186888));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 0.67791390419006348 s. on centrino 1.6GHz
a = rand((186890));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 1.622053861618042 s. on centrino 1.6GHz
a = rand((186889));t1=time.time();fft(a);t2=time.time();t2-t1
# the computation is infinit !!!! (no time to wait)
Question : why is there a problem specialy for that size (186889) of fft ? It is not the only size with that problem.
Thank
Sam
What you can find with help (fft) is ...
"This is most efficient for n a power of two"
Nils
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On 07/02/07, Samuel GARCIA <sgarcia@olfac.univ-lyon1.fr> wrote:
Yes I know this. But 186888 186889 and 186890 are not power of 2 and the computation time is very very different just for a difference of size of only one point. What is the reason ? And how to deal with that ? (I realy need to compute fft with a random nomber of point)
Many problems can be solved by padding. But once in a while one comes up which needs a particular number of points, and it's not always a power of two. Can FFTW (or any of the FFT packages numpy/scipy can use) compute an FFT of size 186889 in a reasonable time? I know there are algorithms for large prime factors, and for small prime factors, and that you can combine the two (though perhaps primes of moderate size are a problem). Anne M. Archibald
On Wed, Feb 07, 2007 at 12:13:12PM -0500, Anne Archibald wrote:
On 07/02/07, Samuel GARCIA <sgarcia@olfac.univ-lyon1.fr> wrote:
Yes I know this. But 186888 186889 and 186890 are not power of 2 and the computation time is very very different just for a difference of size of only one point. What is the reason ? And how to deal with that ? (I realy need to compute fft with a random nomber of point)
Many problems can be solved by padding. But once in a while one comes up which needs a particular number of points, and it's not always a power of two.
Can FFTW (or any of the FFT packages numpy/scipy can use) compute an FFT of size 186889 in a reasonable time? I know there are algorithms for large prime factors, and for small prime factors, and that you can combine the two (though perhaps primes of moderate size are a problem).
I know that FFTW uses O(NlogN) algorithms for any N, even large prime factors. It is quite likely that for large prime N, FFTPACK (which is what numpy uses) goes to a standard DFT algorithm (O(N^2)). One important thing to remember is that the constant in front of the NlogN is highly dependent on the algorithm. That is why even though FFTW v3 uses NlogN algorithms for all N, some N (like powers of 2 and those composed of only small prime factors) are _much_ faster than those for other N. But the bottom line is that no matter what the constant, for large N, O(NlogN) is _much_ faster than O(N^2). Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
On 07/02/07, Scott Ransom <sransom@nrao.edu> wrote:
On Wed, Feb 07, 2007 at 12:13:12PM -0500, Anne Archibald wrote:
Can FFTW (or any of the FFT packages numpy/scipy can use) compute an FFT of size 186889 in a reasonable time? I know there are algorithms for large prime factors, and for small prime factors, and that you can combine the two (though perhaps primes of moderate size are a problem).
I know that FFTW uses O(NlogN) algorithms for any N, even large prime factors. It is quite likely that for large prime N, FFTPACK (which is what numpy uses) goes to a standard DFT algorithm (O(N^2)).
Indeed, for numbers in the vicinity of 10000, some take 500 times as long as others (on my machine), so I suspect that it is falling back to a DFT.
One important thing to remember is that the constant in front of the NlogN is highly dependent on the algorithm. That is why even though FFTW v3 uses NlogN algorithms for all N, some N (like powers of 2 and those composed of only small prime factors) are _much_ faster than those for other N. But the bottom line is that no matter what the constant, for large N, O(NlogN) is _much_ faster than O(N^2).
Of course. But when you say the constant varies, do you mean by a factor of ten? a hundred? a thousand? On my machine, scipy.show_config reports that it can find FFTW2 but not FFTW3; does that mean it is actually *using* FFTW2? How does one tell? Does scipy provide any access to the special features of FFTW's interface? (wisdom, for efficiently computing many FFTs on the same array, for example) Anne M. Archibald
Scott Ransom wrote:
On Wed, Feb 07, 2007 at 12:13:12PM -0500, Anne Archibald wrote:
On 07/02/07, Samuel GARCIA <sgarcia@olfac.univ-lyon1.fr> wrote:
Yes I know this. But 186888 186889 and 186890 are not power of 2 and the computation time is very very different just for a difference of size of only one point. What is the reason ? And how to deal with that ? (I realy need to compute fft with a random nomber of point)
It's probably worth pointing out that 186889 is, in fact, prime, which is certainly the worst case for any algorithm. Andrew
Andrew Jaffe wrote:
Scott Ransom wrote:
On Wed, Feb 07, 2007 at 12:13:12PM -0500, Anne Archibald wrote:
On 07/02/07, Samuel GARCIA <sgarcia@olfac.univ-lyon1.fr> wrote:
Yes I know this. But 186888 186889 and 186890 are not power of 2 and the computation time is very very different just for a difference of size of only one point. What is the reason ? And how to deal with that ? (I realy need to compute fft with a random nomber of point)
It's probably worth pointing out that 186889 is, in fact, prime, which is certainly the worst case for any algorithm. Indeed, this is an important point. There are several things to check, and there are some problems with FFT as implemented now in numpy/scipy when used with fftw (it is suboptimal by several factors).
- First, is the numpy/scipy installed fft really using fftw ? - Also, if fftw is used, it is important to remember that the first time you are using a certain size, fftw has to compute a plan, which may take time. I tested on my machine (Pentium 4, 3.2 Ghz) fftw (in C) for the given size, each result being the best shot on 100 iterations (source attached): testing cached for size 186888 computing plan...done ! cycle is 1.357218e+10, 1.357218e+08 per execution on average, min is 1.323569e+08 testing cached for size 186889 computing plan...done ! cycle is 6.188211e+10, 6.188211e+08 per execution on average, min is 6.016043e+08 testing cached for size 186890 computing plan...done ! cycle is 1.623674e+10, 1.623674e+08 per execution on average, min is 1.595604e+08 testing cached for size 131072 computing plan...done ! cycle is 1.730136e+10, 1.730136e+08 per execution on average, min is 1.682170e+08 testing cached for size 262144 computing plan...done ! cycle is 3.018974e+10, 3.018974e+08 per execution on average, min is 2.978997e+08 cycle being the number of CPU cycles taken by the 100 iterations. The fact that 2 ** 17 is slower than 186888 is weird, though. May be due to some weird cache effects, I don't know. For reference, on my laptop (pentium M, thus much better memory performances and FPU performances overall): testing cached for size 186888 computing plan...done ! cycle is 1.152835e+10, 1.152835e+08 per execution on average, min is 1.070749e+08 testing cached for size 186889 computing plan...done ! cycle is 3.845418e+10, 3.845418e+08 per execution on average, min is 3.654266e+08 testing cached for size 186890 computing plan...done ! cycle is 1.389736e+10, 1.389736e+08 per execution on average, min is 1.314722e+08 testing cached for size 131072 computing plan...done ! cycle is 5.142997e+09, 5.142997e+07 per execution on average, min is 4.860954e+07 testing cached for size 262144 computing plan...done ! cycle is 1.347407e+10, 1.347407e+08 per execution on average, min is 1.287470e+08 Which is more logical... This once again shows how bad a Pentium 4 is for FPU performances on a per cycle point of view :) For speed issues, 0 padding is the easiest solution I can think of, David
David Cournapeau wrote:
Andrew Jaffe wrote:
It's probably worth pointing out that 186889 is, in fact, prime, which is certainly the worst case for any algorithm.
I went a bit further to see why it stuck on 186889. First, it only happens with numpy.fft.fft, not with scipy.fftpack. So fftw has nothing to do with it.
It seems like the fft used in numpy is really slow for prime number (eg N^2, which becomes quite big when your number is 186889...). One thing which could be done at least is to enable SIGINT to be sent to the function to abort it (It takes around 15 minutes to complete on my workstsation). I guess the question is: is there any other implementation of fft usable for prime number in numpy ? David
Samuel GARCIA wrote:
Hello list,
I have a problem with fft speed. could someone try this :
from scipy import * import time
a = rand((186888));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 0.67791390419006348 s. on centrino 1.6GHz
a = rand((186890));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 1.622053861618042 s. on centrino 1.6GHz
a = rand((186889));t1=time.time();fft(a);t2=time.time();t2-t1
# the computation is infinit !!!! (no time to wait)
Question : why is there a problem specialy for that size (186889) of fft ? It is not the only size with that problem.
I guess this is due to the design of fft which is made for input lengths of 2**x. If you try e.g. 262144=2**18 which is even larger than 186889 you'll see that it's pretty fast. Christian
Yes it is pretty fast. The difference of time is not a problem for the end user (between 0.2 and 1 s). But specialy the value of 186889 is really a problem, the computation is infinit. Did you try tis value especialy ? Christian Kristukat wrote:
Samuel GARCIA wrote:
Hello list,
I have a problem with fft speed. could someone try this :
from scipy import * import time
a = rand((186888));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 0.67791390419006348 s. on centrino 1.6GHz
a = rand((186890));t1=time.time();fft(a);t2=time.time();t2-t1
# for me about 1.622053861618042 s. on centrino 1.6GHz
a = rand((186889));t1=time.time();fft(a);t2=time.time();t2-t1
# the computation is infinit !!!! (no time to wait)
Question : why is there a problem specialy for that size (186889) of fft ? It is not the only size with that problem.
I guess this is due to the design of fft which is made for input lengths of 2**x. If you try e.g. 262144=2**18 which is even larger than 186889 you'll see that it's pretty fast.
Christian
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Samuel GARCIA wrote:
Yes it is pretty fast. The difference of time is not a problem for the end user (between 0.2 and 1 s). But specialy the value of 186889 is really a problem, the computation is infinit. Did you try tis value especialy ?
You may try a = rand((186889)) t1=time.time() fft(a,pow(2,18)) t2=time.time() print t2-t1 Nils
yes I was thinking of doing something like that but fft(a,pow(2,18)).shape is of course 262144 (2**18) and when I use ifft after that The length of my signal have changed I have an interpolated signal ... new problem for me ... Nils Wagner wrote:
Samuel GARCIA wrote:
Yes it is pretty fast. The difference of time is not a problem for the end user (between 0.2 and 1 s). But specialy the value of 186889 is really a problem, the computation is infinit. Did you try tis value especialy ?
You may try
a = rand((186889)) t1=time.time() fft(a,pow(2,18)) t2=time.time() print t2-t1
Nils
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Wed, Feb 07, 2007 at 03:18:19PM +0100, Samuel GARCIA wrote:
yes I was thinking of doing something like that but fft(a,pow(2,18)).shape is of course 262144 (2**18) and when I use ifft after that The length of my signal have changed I have an interpolated signal ... new problem for me ...
An interpolated signal? In [19]: N.real(N.fft.ifft(N.fft.fft(N.arange(11),16))) Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., -0., 0., 0., 0., 0.]) Padding in the Fourier-domain, on the other hand: In [20]: N.real(N.fft.ifft(N.fft.fft(N.arange(11)),16)) Out[20]: array([ 0. , 2.10035464, 3.04455839, 1.35780382, 3.23688237, 3.12291155, 2.35790278, 4.12133152, 3.4375 , 3.29213721, 5.00323313, 3.69035768, 4.32561763, 5.98165513, 3.3443057 , 6.58344845]) Cheers Stéfan
Ok sorry. I answered without trying ! Thank a lot. I will solve that way. Sam Stefan van der Walt wrote:
On Wed, Feb 07, 2007 at 03:18:19PM +0100, Samuel GARCIA wrote:
yes I was thinking of doing something like that but fft(a,pow(2,18)).shape is of course 262144 (2**18) and when I use ifft after that The length of my signal have changed I have an interpolated signal ... new problem for me ...
An interpolated signal?
In [19]: N.real(N.fft.ifft(N.fft.fft(N.arange(11),16))) Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., -0., 0., 0., 0., 0.])
Padding in the Fourier-domain, on the other hand:
In [20]: N.real(N.fft.ifft(N.fft.fft(N.arange(11)),16)) Out[20]: array([ 0. , 2.10035464, 3.04455839, 1.35780382, 3.23688237, 3.12291155, 2.35790278, 4.12133152, 3.4375 , 3.29213721, 5.00323313, 3.69035768, 4.32561763, 5.98165513, 3.3443057 , 6.58344845])
Cheers Stéfan _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Wed, Feb 07, 2007 at 04:39:21PM +0200, Stefan van der Walt wrote:
On Wed, Feb 07, 2007 at 03:18:19PM +0100, Samuel GARCIA wrote:
yes I was thinking of doing something like that but fft(a,pow(2,18)).shape is of course 262144 (2**18) and when I use ifft after that The length of my signal have changed I have an interpolated signal ... new problem for me ...
An interpolated signal?
Padding a time series gives you an interpolated (actually, the term often used is "oversampled" Fourier spectrum). Scott
In [19]: N.real(N.fft.ifft(N.fft.fft(N.arange(11),16))) Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., -0., 0., 0., 0., 0.])
Padding in the Fourier-domain, on the other hand:
In [20]: N.real(N.fft.ifft(N.fft.fft(N.arange(11)),16)) Out[20]: array([ 0. , 2.10035464, 3.04455839, 1.35780382, 3.23688237, 3.12291155, 2.35790278, 4.12133152, 3.4375 , 3.29213721, 5.00323313, 3.69035768, 4.32561763, 5.98165513, 3.3443057 , 6.58344845])
Cheers Stéfan _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989
Thanks all, for fast and efficient answers. Zeros padding was really my solution. Sam Scott Ransom wrote:
On Wed, Feb 07, 2007 at 04:39:21PM +0200, Stefan van der Walt wrote:
On Wed, Feb 07, 2007 at 03:18:19PM +0100, Samuel GARCIA wrote:
yes I was thinking of doing something like that but fft(a,pow(2,18)).shape is of course 262144 (2**18) and when I use ifft after that The length of my signal have changed I have an interpolated signal ... new problem for me ...
An interpolated signal?
Padding a time series gives you an interpolated (actually, the term often used is "oversampled" Fourier spectrum).
Scott
In [19]: N.real(N.fft.ifft(N.fft.fft(N.arange(11),16))) Out[19]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., -0., 0., 0., 0., 0.])
Padding in the Fourier-domain, on the other hand:
In [20]: N.real(N.fft.ifft(N.fft.fft(N.arange(11)),16)) Out[20]: array([ 0. , 2.10035464, 3.04455839, 1.35780382, 3.23688237, 3.12291155, 2.35790278, 4.12133152, 3.4375 , 3.29213721, 5.00323313, 3.69035768, 4.32561763, 5.98165513, 3.3443057 , 6.58344845])
Cheers Stéfan _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Samuel Garcia Universite Claude Bernard LYON 1 CNRS - UMR5020, Laboratoire des Neurosciences et Systemes Sensoriels 50, avenue Tony Garnier 69366 LYON Cedex 07 FRANCE Tél : 04 37 28 74 64 Fax : 04 37 28 76 01 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
participants (8)
-
Andrew Jaffe -
Anne Archibald -
Christian Kristukat -
David Cournapeau -
Nils Wagner -
Samuel GARCIA -
Scott Ransom -
Stefan van der Walt