Hello, In https://github.com/numpy/numpy/issues/5312 there's a request for an aligned allocator in Numpy (more than the default alignment of the platform's memory allocator). The reason is that on modern vectorization instruction sets, a certain alignment is required for optimal performance (even though unaligned data still works: it's just that performance is degraded... by how much will depend on the CPU micro-architecture). For example Intel recommends a 32-byte alignment for AVX loads and stores. In https://github.com/numpy/numpy/pull/5457 I have proposed a patch to wrap the system allocator in an aligned allocator. The proposed scheme makes the alignment configurable at runtime (through a Python API), because different platforms may have different desirable alignments, and it is not reasonable for Numpy to know about them all, nor for users to recompile Numpy each time they have a different CPU. By always using an aligned allocator there is some overhead: - all arrays occupy a bit more memory by a small average amount (probably 16 bytes average on a 64-bit machine, for a 16 byte guaranteed alignment) - array resizes can be more expensive in CPU time, when the physical start changes and its alignment changes too There is also a limitation: while the physical start of an array will always be aligned, this can be defeated when taking a view starting at a non-zero index. (note that to take advantage of certain instruction set features such as AVX, Numpy may need to be compiled with specific compiler flags... but Numpy's allocations also affect other packages such as Numba which is able to generate code at runtime) I would like to know if people are interested in this feature, and if the proposed approach is acceptable. Regards Antoine.
By always using an aligned allocator there is some overhead: - all arrays occupy a bit more memory by a small average amount (probably 16 bytes average on a 64-bit machine, for a 16 byte guaranteed alignment)
NumPy arrays are Python objects. They have an overhead anyway, much more than this, and 16 bytes are not worse than adding a couple of pointers to the struct. In the big picture this tiny overhead does not matter.
- array resizes can be more expensive in CPU time, when the physical start changes and its alignment changes too
We are using Python. If we were worried about small inefficiencies we would not be using it. Resizing ndarrays are rare anyway. They are not used like Python lists or instead of lists. We use lists in the same way as anyone else who uses Python. So an ndarray resize can afford to be more espensive than a list append. Also the NumPy community expects an ndarray resize to be expensive and O(n) due to its current behavior: If an array has a view, realloc is out of the question. :-) Sturla