
Good day, Ralf. I am sharing the results of the latest updates on our code. We have taken into account the comments below and are testing the timing with %timeit -o inside jupyter, having information about the best of 7 code passes and the average deviation. Writing to summarise the intermediate results. The testing notebooks: Memory Usage - https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/memory_... Timing comparisons(updated) - https://github.com/2D-FFT-Project/2d-fft/blob/testnotebook/notebooks/compari... Our version loses to Scipy always if multithreading is enabled, also we wondered about type conversions - whether to leave them for test metrics or not. The point is that they are necessary for converting matrix values from int to complex128 (we will replace them with 64 if necessary) and back when outputting. For more convenient user-experience we preferred to leave the conversions for testing, we will be interested in your opinion. Regarding the results we have after all updates - everything is stable in memory, our operation wins by 2 times. Regarding execution time and efficiency - I have the following opinion. On tests with multithreading enabled we are consistently losing, while on tests with multithreading disabled we are consistently winning. From this we should draw one logical conclusion - our algorithm is mathematically smarter, which makes it possible for it to win steadily within the limits of memory usage and performance when multithreading is switched off. At the same time, multithreading itself, used by Scipy authors, is better and more efficient than ours - that's why our operation loses algorithmically at the moment when it is switched on. From this I can conclude that our algorithm is still more performant, but it obviously needs modification of the existing multithreading system. In this situation we need your advice. In theory, we can figure out and write a more efficient and smarter algorithm for multithreading than our current one. In practice, I'm sure the best way forward would be to collaborate with someone responsible for FFT from Scipy or NumPy so that we can test our algorithm with their multithreading, I'm sure this action will give the best possible performance at the moment in general. I propose this option instead of our separate multithreading writing, as the goal of our work is to embed in NumPy so that as many people as possible can use a more efficient algorithm for their work. And if we write our multithreading first, we will then have to switch to the NumPy version to synthesise the package anyway. So I'm asking for your feedback on our memory usage and operation efficiency results to decide together the next steps of our hopefully collaborative work, if you're interested in doing so. Thank you for your time. Regards, Alexander