Switching default order to column-major
Hi, I found that the documented default row-major order is enforced throughout the library with a series of `order='C'` default parameters, so given this I supposed there's no way to change the default (or am I wrong?) If, supposedly, I'd change that by patching the library (substituting 'C's for 'F's), do you think there would by any problem with other downstream libraries using numpy in my project? Do you think they assume a default-constructed array is always row-major and access the underlying data?
On Sat, Nov 11, 2023 at 8:07 AM Valerio De Benedetto <posta@debevv.com> wrote:
Hi, I found that the documented default row-major order is enforced throughout the library with a series of `order='C'` default parameters, so given this I supposed there's no way to change the default (or am I wrong?) If, supposedly, I'd change that by patching the library (substituting 'C's for 'F's), do you think there would by any problem with other downstream libraries using numpy in my project? Do you think they assume a default-constructed array is always row-major and access the underlying data?
Nobody expects the column major arrays. Chuck
I think you can always using order="F" in your own code. If you patched NumPy and then the downstream libraries had to use your customized NumPy I think you would see some breaks. Probably not a lot, since many use the python numpy API which handles C or F well. Some code does do things like call array.flat after creating an array with default arguments from a list or using "copy=True" and then expects the data to be ordered as if order="C". Kevin On Sat, Nov 11, 2023 at 3:03 PM Valerio De Benedetto <posta@debevv.com> wrote:
Hi, I found that the documented default row-major order is enforced throughout the library with a series of `order='C'` default parameters, so given this I supposed there's no way to change the default (or am I wrong?) If, supposedly, I'd change that by patching the library (substituting 'C's for 'F's), do you think there would by any problem with other downstream libraries using numpy in my project? Do you think they assume a default-constructed array is always row-major and access the underlying data? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: kevin.k.sheppard@gmail.com
High level abstractions like .flat or boolean indexing / np.nonzero() always use C ordering regardless of the underlying data.
list(np.asarray([[0, 1], [2, 3]]).flat) [0, 1, 2, 3] list(np.asarray([[0, 1], [2, 3]], order='F').flat) [0, 1, 2, 3]
C and Fortran ordering are really just special cases of contiguous strides. In general an array could be non-contiguous, or virtually broadcasted, or have some other virtual ordering due to stride tricks, and it might not even make sense to say that it's C or Fortran ordered. The real issue with using Fortran ordering is that you might have bad performance, because most libraries are written assuming C ordering, performing operations on what would be contiguous memory if the array were C ordered but isn't when it's Fortran ordered. For example, In [1]: import numpy as np In [2]: a = np.ones((100, 100, 100)) # a has C order (the default) In [3]: %timeit np.sum(a[0]) 8.57 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) In [4]: a_f = np.asarray(a, order='F') In [5]: %timeit np.sum(a_f[0]) 26.3 µs ± 952 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) Summing a[0] is 3 times faster for a C-ordered array because a[0] is contiguous in C order. The exact reverse timings will hold for np.sum(a[..., 0]) vs. np.sum(a_f[..., 0]). But it's typical to write code like this, and you can see that even the very basic NumPy indexing API favors C ordering by letting you write a[0] instead of a[..., 0] to get a contiguous piece of memory. The degree to which this matters in practice will depend on the exact thing a given library is doing and also things like the size of your data relative to your CPU caches. Aaron Meurer On Sun, Nov 12, 2023 at 8:18 AM Kevin Sheppard <kevin.k.sheppard@gmail.com> wrote:
I think you can always using order="F" in your own code.
If you patched NumPy and then the downstream libraries had to use your customized NumPy I think you would see some breaks. Probably not a lot, since many use the python numpy API which handles C or F well. Some code does do things like call array.flat after creating an array with default arguments from a list or using "copy=True" and then expects the data to be ordered as if order="C".
Kevin
On Sat, Nov 11, 2023 at 3:03 PM Valerio De Benedetto <posta@debevv.com> wrote:
Hi, I found that the documented default row-major order is enforced throughout the library with a series of `order='C'` default parameters, so given this I supposed there's no way to change the default (or am I wrong?) If, supposedly, I'd change that by patching the library (substituting 'C's for 'F's), do you think there would by any problem with other downstream libraries using numpy in my project? Do you think they assume a default-constructed array is always row-major and access the underlying data? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: kevin.k.sheppard@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com
Hi, On Mon, Nov 13, 2023 at 7:41 AM Aaron Meurer <asmeurer@gmail.com> wrote:
High level abstractions like .flat or boolean indexing / np.nonzero() always use C ordering regardless of the underlying data.
list(np.asarray([[0, 1], [2, 3]]).flat) [0, 1, 2, 3] list(np.asarray([[0, 1], [2, 3]], order='F').flat) [0, 1, 2, 3]
Just in case it caused others to pause as it did me, here's the Boolean indexing demonstration:
c_arr = np.asarray([[0, 1], [2, 3]]) f_arr = np.asarray([[0, 1], [2, 3]], order='F') bool_arr = np.array([[False, True], [True, False]]) c_arr[bool_arr] array([1, 2]) f_arr[bool_arr] array([1, 2]) c_arr[np.array(bool_arr, order='F')]. # Indexing array order is irrelevant array([1, 2])
np.nonzero(c_arr < 3) (array([0, 0, 1]), array([0, 1, 0])) np.nonzero(f_arr < 3) (array([0, 0, 1]), array([0, 1, 0]))
Cheers, Matthew
Few things in the Python API care about order, but there are also quite a few places that will return C-order (and are faster for C-order inputs) whether you change those defaults or not. The main issue is that e.g. some cython wrappers will probably assume that the newly created array is C-order. And those will just not work. For example, I would imagine many libraries that have C/Cython wrappers have code that doesn't specify `order="C"` explicitly (why would they?) but then passes it into a typed memory-views (if cython) like `double[:, ::1]` enforcing a C-contiguous memory layout for speed. Such code should normally fail gracefully, but fail it will. Also, as Aaron said, a lot of these places might not enforce it but still be speed impacted. So yes, it would be expected break a lot of C-interfacing code that has Python wrappers around it to normalize input. - Sebastian On Fri, 2023-11-10 at 22:37 +0000, Valerio De Benedetto wrote:
Hi, I found that the documented default row-major order is enforced throughout the library with a series of `order='C'` default parameters, so given this I supposed there's no way to change the default (or am I wrong?) If, supposedly, I'd change that by patching the library (substituting 'C's for 'F's), do you think there would by any problem with other downstream libraries using numpy in my project? Do you think they assume a default-constructed array is always row-major and access the underlying data? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
My Cython code and my swig wrapped C++ code assumes the C-ordering and contiguous layout which allows for super fast code. I guess making it agnostic for the ordering would require implementing everything twice and then switch between them based on what comes in. That is a lot of work for no gain. Rewriting it for F-ordering would also be a pain.
Thanks to everyone for your answers. I guess the conclusion is that changing the default ordering will do more harm than good. So, considering the C API, what do you advise to use to copy a (probably) row-major ndarray into a data structure that is always column-major? Like some kind of iterator that will jump across strides in the source array in the fastest way possible?
participants (7)
-
Aaron Meurer
-
Charles R Harris
-
Kevin Sheppard
-
Matthew Brett
-
Ronald van Elburg
-
Sebastian Berg
-
Valerio De Benedetto