
Hi Stéfan, upd: indeed, rfft2 has equal memory usage with our fft2d in terms of reals. thanks, Stefan. to this moment, i believe the results are following:
scipy time outperformance on rectangular signals with sides of power-of-two. equal memory usage with rfft2
in my eyes, it's worth trying putting our algorithm and scipy multithreading together, considering previous results, I believe it'll show major performance improvements. in case it does, i still think it's worthy trying putting the Cooley-Tukey operation in work in terms of cases of the mentioned signals. like i suggest we try testing our code as a part of numpy/scipy, tbh, i really lost the track of whether this thread is about numpy or scipy embedding. i believe if we could place the butterfly algorithm into scipy and add a 'checking if' for the size of the matrix, we would rather win some performance than lose, i think any advantage in performance of the algorithm is important, considering the balance of memory usage and time is still observed. i suppose in terms of algorithm performance one step for a man is a leap for mankind in terms of other projects. please lmk if you share this opinion and we should try testing. regards Alexander