On Wed, Dec 19, 2012 at 3:27 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Wed, Dec 19, 2012 at 8:10 AM, Nathaniel Smith <njs@pobox.com> wrote:
Right, my intuition is that it's like order="C" -- if you make a new array by, say, indexing, then it may or may not have order="C", no guarantees. So when you care, you call asarray(a, order="C") and that either makes a copy or not as needed. Similarly for base alignment.
I guess to push this analogy even further we could define a set of array flags, ALIGNED_8, ALIGNED_16, etc. (In practice only power-of-2 alignment matters, I think, so the number of flags would remain manageable?) That would make the C API easier to deal with too, no need to add PyArray_FromAnyAligned.
Another possibility is an aligned datatype, basically an aligned structured array with floats/ints in chunks of the appropriate size. IIRC, gcc support for sse is something like that.
True; right now it looks like structured dtypes have no special alignment: In [13]: np.dtype("f4,f4").alignment Out[13]: 1 So for this approach we'd need a way to create structured dtypes with .alignment == .itemsize, and we'd need some way to request dtype-aligned memory from array allocation functions. I guess existing NPY_ALIGNED is a good enough public interface for the latter, but AFAICT the current implementation is to just assume that whatever malloc() returns will always be ALIGNED. This is true for all base C types, but not for more exotic record types with larger alignment requirements -- that would require some fancier allocation scheme. Not sure which interface is more useful to users. On the one hand, using funny dtypes makes regular non-SIMD access more cumbersome, and it forces your array size to be a multiple of the SIMD word size, which might be inconvenient if your code is smart enough to handle arbitrary-sized arrays with partial SIMD acceleration (i.e., using SIMD for most of the array, and then a slow path to handle any partial word at the end). OTOH, if your code *is* that smart, you should probably just make it smart enough to handle a partial word at the beginning as well and then you won't need any special alignment in the first place, and representing each SIMD word as a single numpy scalar is an intuitively appealing model of how SIMD works. OTOOH, just adding a single argument np.array() is a much simpler to explain than some elaborate scheme involving the creation of special custom dtypes. -n