On Wed, 2012-12-19 at 15:10 +0000, Nathaniel Smith wrote: <snip>
Is this something that can be rolled into Numpy (the feature, not my particular implementation or interface - though I'd be happy for it to be so)?
Regarding (b), I've written a test case that works for Linux on x86-64 with GCC (my platform!). I can test it on 32-bit windows, but that's it. Is ARM supported by Numpy? Neon would be great to include as well. What other platforms might need this?
Your code looks simple and portable to me (at least the alignment part). I can see a good argument for adding this sort of functionality directly to numpy with a nice interface, though, since these kind of requirements seem quite common these days. Maybe an interface like a = np.asarray([1, 2, 3], base_alignment=32) # should this be in bits or in bytes? b = np.empty((10, 10), order="C", base_alignment=32) # etc. assert a.base_alignment == 32 which underneath tries to use posix_memalign/_aligned_malloc when possible, or falls back on the overallocation trick otherwise?
There is a thread about this from several years back. IIRC, David Cournapeau was interested in the same problem. At first glance, the alignment keyword looks interesting. One possible concern is keeping alignment for rows, views, etc., which is probably not possible in any sensible way. But people who need this most likely know what they are doing and just need memory allocated on the proper boundary.
Right, my intuition is that it's like order="C" -- if you make a new array by, say, indexing, then it may or may not have order="C", no guarantees. So when you care, you call asarray(a, order="C") and that either makes a copy or not as needed. Similarly for base alignment.
I guess to push this analogy even further we could define a set of array flags, ALIGNED_8, ALIGNED_16, etc. (In practice only power-of-2 alignment matters, I think, so the number of flags would remain manageable?) That would make the C API easier to deal with too, no need to add PyArray_FromAnyAligned.
So, if I were to implement this, I presume the proper way would be through modifications to multiarray? Would this basic description be a reasonable initial target? Henry