
Hi Stefan, indeed you're right, the underlying formula initially was created by V. Tutatchikov for power-of-two matrices. The initial butterfly approach requires a recursive breakdown to 2x2 matrix in order to proceed with precalculations of roots of unity (exactly what provides you the aforementioned performance advantage). I did my own research on implementations where the size is not a power of two and they still implement the butterfly, usually they proceeded with 0 imputation to build to closest power of 2 (which is a mess on a bigger sizes) or tried dropped the columns and building them back which is also not a brilliant solution. At some point I found a patent number US 8.484.274B2, where they discussed the possible padding options for such matrices. The methodology was picked depending on the actual size of signal/data, therefore, in case you proceed with implementing butterfly with such approach, you'd need to write several cases and check the size. Theoretically, it's possible, though I can't really say if this particular case would give much more performance advantage rather than the same or the worse. Yep, indeed the intend is to generate a random matrix, so our tests can be objective as they can be. I appreciate your review and will also run the memory against rfft (i hope tomorrow). So for now I'm sure this method shows better performance on size of power-of-two square matrices as well as rectangular matrices of size 2^n x 2^m (this one was tested during development process). Speaking of other sizes, I mentioned above that I have some thoughts, but it's very case sensitive in terms of specific type of signal we are working with. I would like to emphasize that regardless of my genuine desire to create the best possible version for the user, my work is not limited by my desire but constrained by capabilities. As you may have noticed earlier, the multithreading implemented by the authors of Scipy surpasses the version created by us. Considering the matrix dimension sensitivity of the butterfly method, I am ready to share my thoughts regarding specific data sizes and use cases. However, I cannot readily provide an optimal solution for all cases, otherwise, I would have implemented it first and then presented it to you. At the moment, I believe this is the only significant vulnerability in the mathematics we have presented. I can't provide much feedback on Bluestein / Chirp-Z at this very moment, but I'll research on this matter and if in some case it solves our issue - will definitely implement it. At this very point, I believe if our method after thorough provided corrections still has some performance advantage (even on matrices of more simple sizes like power-of-two) it is worth to embed it even using in 'if-cases' in my opinion. Yet i'm only a contributor and that's why i'm discussing this matter with you. I want to admit I appreciate your feedback and your time and will be waiting for your response. Regards, Alexander