storage order and concatenate
Hi, I noticed a behaviour which I found counter-intuitive at least when using concatenate. I have a function which takes a numpy array of rank 2 as input, let's say foo(in): a = N.randn((10, 2)) foo(a) To test a ctype implementation of foo against the python version, my test has something like X1 = N.linspace(-2, 2, 10)[:, N.newaxis] X2 = N.linspace(-1, 3, 10)[:, N.newaxis] a = N.concatenate(([X1, X2]), 1) which has Fortran storage (column major order), where as creating a as a = N.zeros((10, 2)) a[:,0] = N.linspace(-2, 2, 10) a[:,1] = N.linspace(-1, 3, 10) has C storage (row major order). What are the rules concerning storage with numpy ? I thought it was always C, except if stated explicitly. I can obviously understand why concatenate gives a Fortran order from an implementation point of view, but this looks kind of strange to me, David
Hi,
What are the rules concerning storage with numpy ? The rule is that a numpy array has "strides" which specify how many bytes to skip to get to the next element in the array. That's the internal model. There are no hard and fast rules about storage order. Internally, C-order is as good as Fortran-order (except the array iterator gives special preference to C-order and all functions for which
David Cournapeau wrote: the order can be specified (like zeros) default to C-order). Thus, the storage order is whatever the strides say it is. Now, there are flags that keep track of whether or not the strides agree with the 2 recognized special cases of "Fortran-order" (first-index varies the fastest) or "C-order" (last-index varies the fastest). But, this is only for convenience. Very few functions actually require a specification of storage order. Those that allow it default to "C-order". You can't think of a NumPy array has having a particular storage order unless you explicitly request it. One of the most common ways that Fortran-order arrays show up, for example is when a C-order array is transposed. A transpose operation does nothing except flip the strides (and therefore the flags) of the array. This is what is happening in concatenate (using axis=1) to give you a Fortran-order array. Bascially, code equivalent to the following is being run: concatenate([X1.T, X2.T]).T In the second example, you explicitly create the array (and therefore the strides) as C-order and then fill it (so it doesn't change on you). The first example used array calculations which don't guarantee the storage order. This is all seamless to the user until you have to interface with extension code. Ideally, you write compiled code that deals with strided arrays. If you can't, then you request an array of the required storage-order. By the way, for interfacing with ctypes, check out the ctypeslib.ndpointer class-creation function for flag checking and the require function for automatic conversion of an array to specific requirements. -Travis
Travis Oliphant wrote:
David Cournapeau wrote:
Hi,
What are the rules concerning storage with numpy ?
The rule is that a numpy array has "strides" which specify how many bytes to skip to get to the next element in the array. That's the internal model. There are no hard and fast rules about storage order. Internally, C-order is as good as Fortran-order (except the array iterator gives special preference to C-order and all functions for which the order can be specified (like zeros) default to C-order).
Ok, this is again a bad habit from matlab to think in C or F order...
Thus, the storage order is whatever the strides say it is. Now, there are flags that keep track of whether or not the strides agree with the 2 recognized special cases of "Fortran-order" (first-index varies the fastest) or "C-order" (last-index varies the fastest). But, this is only for convenience. Very few functions actually require a specification of storage order. Those that allow it default to "C-order".
You can't think of a NumPy array has having a particular storage order unless you explicitly request it. One of the most common ways that Fortran-order arrays show up, for example is when a C-order array is transposed. A transpose operation does nothing except flip the strides (and therefore the flags) of the array. This is what is happening in concatenate (using axis=1) to give you a Fortran-order array. Bascially, code equivalent to the following is being run: concatenate([X1.T, X2.T]).T
In the second example, you explicitly create the array (and therefore the strides) as C-order and then fill it (so it doesn't change on you). The first example used array calculations which don't guarantee the storage order.
This is all seamless to the user until you have to interface with extension code. Ideally, you write compiled code that deals with strided arrays. If you can't, then you request an array of the required storage-order.
By the way, for interfacing with ctypes, check out the ctypeslib.ndpointer class-creation function for flag checking and the require function for automatic conversion of an array to specific requirements.
I tried to to that at first, but I couldn't make the examples of numpy works; after having tried at home with beta versions, it looks like it is a ctype version problem, as it works with ctype 1.0 + numpy 1.0rc1. Thanks for the explanations, this answers most questions I had in mind for numpy internal layout compared to matlab which I am used to. I think this should be in the wiki somewhere; do you mind if I use your email as a basis for the tentative numpy tutorial (memory layout section, maybe) ? David
participants (2)
-
David Cournapeau
-
Travis Oliphant