On 8/9/07, Stefan van der Walt <stefan@sun.ac.za> wrote:
It doesn't really matter where the memory allocation occurs, does it? As far as I understand, the underlying fftw function has some flag to indicate when the data is aligned. If so, we could expose that flag in Python, and do something like
x = align16(data) _fft(x, is_aligned=True)
I am not intimately familiar with the fft wrappers, so maybe I'm missing something more fundamental.
You can do that, but this is only a special case of what I have in mind. For example, what if you want to call functions which are relatively cheap, but called many times, and want an aligned array ? Going back and forth would be a huge waste. Also, having aligned buffers internally (in C_, even for non array data, can be useful (eg filters, and maybe even core numpy functionalities like ufunc, etc...). Another point I forgot to mention before is that we can define a default alignment which would already be SIMD friendly (as done on Mac OS X or FreeBSD by default malloc) for *all* numpy arrays at 0 cost: for fft, this means that most arrays would already by as wanted, meaning a huge boost of performances for free. Basically, the functionalities would be more usable in C, without too much constraint, because frankly, the implementation is not difficult: I have something almost ready, and the patch is 7kb, including code to detect platform dependent aligned allocator. The C code can be tested really easily (since it is independent of python). David