Raveling, reshape order keyword unnecessarily confuses index and memory ordering
Hi, We were teaching today, and found ourselves getting very confused about ravel and shape in numpy. Summary -------------- There are two separate ideas needed to understand ordering in ravel and reshape: Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering" The index ordering is usually (but see below) orthogonal to the memory ordering. The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing. What the current situation looks like ---------------------------------------------------- Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly. This was what we knew, or should have known: In [2]: import numpy as np In [3]: arr = np.arange(10).reshape((2, 5)) In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0. So far so good (even if the opposite to MATLAB, Octave). Then we found the 'order' flag to ravel: In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) But we soon got confused. How about this? In [12]: arr_F = np.array(arr, order='F') In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering. And in fact, we can ask for memory ordering specifically: In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9]) There are some confusions to get into with the 'order' flag to reshape as well, of the same type. Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering. This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering. Proposal ------------- * Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov) What do y'all think? Cheers, Matthew Paul Ivanov JB Poline
On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it. In my numpy htmlhelp for version 1.5, I don't have a K or A option
np.__version__ '1.5.1'
np.arange(5).ravel("K") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: order not understood
np.arange(5).ravel("A") array([0, 1, 2, 3, 4])
the C, F in ravel have their twins in reshape
arr = np.arange(10).reshape(2,5, order="C").copy() arr array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) arr.ravel() array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) arr = np.arange(10).reshape(2,5, order="F").copy() arr array([[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]]) arrarr.ravel("F") array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
For example we use it when we get raveled arrays from R, and F for column order and C for row order indexing are pretty obvious names when coming from another package (Matlab, R, Gauss) Josef
On Sat, Mar 30, 2013 at 7:14 AM,
On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
In my numpy htmlhelp for version 1.5, I don't have a K or A option
np.__version__ '1.5.1'
np.arange(5).ravel("K") Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: order not understood
np.arange(5).ravel("A") array([0, 1, 2, 3, 4])
the C, F in ravel have their twins in reshape
arr = np.arange(10).reshape(2,5, order="C").copy() arr array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) arr.ravel() array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) arr = np.arange(10).reshape(2,5, order="F").copy() arr array([[0, 2, 4, 6, 8], [1, 3, 5, 7, 9]]) arrarr.ravel("F") array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
For example we use it when we get raveled arrays from R, and F for column order and C for row order indexing are pretty obvious names when coming from another package (Matlab, R, Gauss)
just a quick search to get an idea in statsmodels 19 out of 135 ravel are ravel('F') 50 out of 270 reshapes specify: reshape.*order='F' (regular expression) Josef
Josef
Hi,
On Sat, Mar 30, 2013 at 4:14 AM,
On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output. The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen. I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having. Cheers, Matthew
On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output.
The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen.
I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases. ------------ I always thought of "order" in array creation is the way we want to have the memory layout of the *target* array and has nothing to do with existing memory layout (creating view or copy as needed). reshape, and ravel are *views* if possible, memory might just be some weird strides (and can be ignored unless you want to do some memory optimization, keeping track of the memory is difficult. I don't think I will start to use A and K after upgrading numpy.)
a1 = np.ones((10,4))
not contiguous
arr2 = a1[:, 2:4] arr2.flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
stack columns (needs to make a copy)
arr3 = arr2.ravel('F') arr3.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
stack columns or rows with reshape (I have no idea what it did with the memory)
arr2.reshape(-1,1).flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
arr2.reshape(-1,1, order='F').flags C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
arr2.reshape(-1, order='F').flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
------------------- one case where I do pay attention to memory layout is column slicing
arr = np.ones((10, 5), order='F') for i in range(1, 5): print arr[:, :i+2].ravel('C').flags['OWNDATA'] ??? for i in range(1,5): print arr[:, :i+2].ravel('F').flags['OWNDATA'] ???
Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Mar 30, 2013 at 4:57 PM,
On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output.
The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen.
I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases.
example from our statistics use: rows are observations/time periods, columns are variables/individuals using "F" or "C", we can stack either by time-periods (observations) or individuals (cross-section units) that's easy to understand. "A" and "K" are pretty useless for us, because we don't know which stacking we would get (we don't try to control the memory layout) The only reason to use "A" or "K", in my opinion, is to use the existing memory efficiently. Since the order in the array is unpredictable, it only makes sense if we don't care about it, for example when we only have elementwise operations. Josef
Hi,
On Sat, Mar 30, 2013 at 2:20 PM,
On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output.
The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen.
I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases.
example from our statistics use: rows are observations/time periods, columns are variables/individuals
using "F" or "C", we can stack either by time-periods (observations) or individuals (cross-section units) that's easy to understand.
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong. Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here. Cheers, Matthew
On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett < matthew.brett@gmail.com> wrote:
Ravel and reshape use the tems 'C' and 'F" in the sense of index
ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should
avoid
using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I got all four correct. I think the concept --- at least for ravel --- is pretty simple: would you like to read the data off in C ordering or Fortran ordering. Since the output array is one-dimensional, its ordering is irrelevant. I don't understand the 'Z' / 'N' suggestion at all. Are they part of some pneumonic? I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy already suffers from too much bikeshedding with names --- I rarely am able to pull out a script I wrote using NumPy even a few years ago and have it immediately work. Cheers, Brad
Hi,
On Sat, Mar 30, 2013 at 4:31 PM, Bradley M. Froehle
On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: > > Ravel and reshape use the tems 'C' and 'F" in the sense of index > ordering. > > This is very confusing. We think the index ordering and memory > ordering ideas need to be separated, and specifically, we should > avoid > using "C" and "F" to refer to index ordering. > > Proposal > ------------- > > * Deprecate the use of "C" and "F" meaning backwards and forwards > index ordering for ravel, reshape > * Prefer "Z" and "N", being graphical representations of unraveling > in > 2 dimensions, axis1 first and axis0 first respectively (excellent > naming idea by Paul Ivanov) > > What do y'all think? I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I got all four correct.
Then you are smarted and or better informed than we were. I hope you didn't read my explanation before you tested yourself. Of course if you did read my email first I'd expect you and I to get the answer right first time. If you didn't read my email first, and didn't think too hard about it, and still got all the examples right, and you'd get other more confusing examples right that use reshape, then I'd add you as a data point on the other side to the four data points we got yesterday.
I think the concept --- at least for ravel --- is pretty simple: would you like to read the data off in C ordering or Fortran ordering. Since the output array is one-dimensional, its ordering is irrelevant.
Right - hence my confidence that Josef's sense of thinking of the 'C' and 'F' being target array output was not a good way to think of it in this case. It is in the case of arr.tostring() though.
I don't understand the 'Z' / 'N' suggestion at all. Are they part of some pneumonic?
Think of the way you'd read off the elements using reverse (last-first) index order for a 2D array, you might imagine something like a Z.
I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy already suffers from too much bikeshedding with names --- I rarely am able to pull out a script I wrote using NumPy even a few years ago and have it immediately work.
I wish we could drop bike-shedding - it's a completely useless word because one person's bike-shedding is another person's necessary clarification. You think this clarification isn't necessary and you think this discussion is bike-shedding. I'm not suggesting dropping the 'F' and 'C', obviously - can I call that a 'straw man'? I am suggesting changing the name to something much clearer, leaving that name clearly explained in the docs, and leaving 'C' and 'F" as functional synonyms for a very long time. Cheers, Matthew
On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: > > Ravel and reshape use the tems 'C' and 'F" in the sense of index > ordering. > > This is very confusing. We think the index ordering and memory > ordering ideas need to be separated, and specifically, we should > avoid > using "C" and "F" to refer to index ordering. > > Proposal > ------------- > > * Deprecate the use of "C" and "F" meaning backwards and forwards > index ordering for ravel, reshape > * Prefer "Z" and "N", being graphical representations of unraveling > in > 2 dimensions, axis1 first and axis0 first respectively (excellent > naming idea by Paul Ivanov) > > What do y'all think? I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question" ravel F and C have *nothing* to do with memory layout. I think it's not confusing for beginners that have no idea and never think about memory layout. I've never seen any problems with it in statsmodels and I have seen many developers (GSOC) that are pretty new to python and numpy. (I didn't check the repo history to verify, so IIRC) Even if N, Z were clearer in this case (which I don't think it is and which I have no idea what it should stand for), you would have to go for every use of ``order`` in numpy to check whether it should be N or F or Z or C, and then users would have to check which order name convention is used in a specific function. Josef
I got all four correct. I think the concept --- at least for ravel --- is pretty simple: would you like to read the data off in C ordering or Fortran ordering. Since the output array is one-dimensional, its ordering is irrelevant.
I don't understand the 'Z' / 'N' suggestion at all. Are they part of some pneumonic?
I'd STRONGLY advise against deprecating the 'F' and 'C' options. NumPy already suffers from too much bikeshedding with names --- I rarely am able to pull out a script I wrote using NumPy even a few years ago and have it immediately work.
Cheers, Brad
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 4:14 AM,
wrote: > On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett > wrote: >> >> Ravel and reshape use the tems 'C' and 'F" in the sense of index >> ordering. >> >> This is very confusing. We think the index ordering and memory >> ordering ideas need to be separated, and specifically, we should >> avoid >> using "C" and "F" to refer to index ordering. >> >> Proposal >> ------------- >> >> * Deprecate the use of "C" and "F" meaning backwards and forwards >> index ordering for ravel, reshape >> * Prefer "Z" and "N", being graphical representations of unraveling >> in >> 2 dimensions, axis1 first and axis0 first respectively (excellent >> naming idea by Paul Ivanov) >> >> What do y'all think? > > I always thought "F" and "C" are easy to understand, I always thought > about > the content and never about the memory when using it. changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
ravel F and C have *nothing* to do with memory layout.
We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy.
I think it's not confusing for beginners that have no idea and never think about memory layout. I've never seen any problems with it in statsmodels and I have seen many developers (GSOC) that are pretty new to python and numpy. (I didn't check the repo history to verify, so IIRC)
Usually you don't need to know what reshape or ravel did because you are likely to reshape again and that will use the same algorithm. For example, I didn't know that that ravel worked in reverse index order, started explaining it wrong, and had to check. I use ravel and reshape a lot, and have not run into this problem because either a) I didn't test my code properly or b) I did reshape after ravel / reshape and it reversed what I did first time. So, I don't think it's "we haven't noticed any problems" is a good argument in the face of "several experienced developers got it wrong when trying to guess what it did".
Even if N, Z were clearer in this case (which I don't think it is and which I have no idea what it should stand for), you would have to go for every use of ``order`` in numpy to check whether it should be N or F or Z or C, and then users would have to check which order name convention is used in a specific function.
Right - and this would be silly if and only if it made sense to conflate memory layout and index ordering. Cheers, Matthew
On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: > On Sat, Mar 30, 2013 at 4:14 AM, wrote: >> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >> wrote: >>> >>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>> ordering. >>> >>> This is very confusing. We think the index ordering and memory >>> ordering ideas need to be separated, and specifically, we should >>> avoid >>> using "C" and "F" to refer to index ordering. >>> >>> Proposal >>> ------------- >>> >>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>> index ordering for ravel, reshape >>> * Prefer "Z" and "N", being graphical representations of unraveling >>> in >>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>> naming idea by Paul Ivanov) >>> >>> What do y'all think? >> >> I always thought "F" and "C" are easy to understand, I always thought >> about >> the content and never about the memory when using it. changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking. I don't think I ever get confused about reshape "F" in 2d. But when I work with 3d or larger ndim nd-arrays, I always have to try an example to check my intuition (in general not just reshape).
ravel F and C have *nothing* to do with memory layout.
We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy.
I guess that wasn't so helpful. (emphasis on *target*, There are very few places where an order keyword refers to *existing* memory layout. So I'm not tempted to think about existing memory layout when I see ``order``. Also my examples might have confused the issue: ravel and reshape, with C and F are easy to understand without ever looking at memory issues. memory only comes into play when we want to know whether we get a view or copy. The examples were only for the cases when I do care about this. )
I think it's not confusing for beginners that have no idea and never think about memory layout. I've never seen any problems with it in statsmodels and I have seen many developers (GSOC) that are pretty new to python and numpy. (I didn't check the repo history to verify, so IIRC)
Usually you don't need to know what reshape or ravel did because you are likely to reshape again and that will use the same algorithm.
For example, I didn't know that that ravel worked in reverse index order, started explaining it wrong, and had to check. I use ravel and reshape a lot, and have not run into this problem because either a) I didn't test my code properly or b) I did reshape after ravel / reshape and it reversed what I did first time. So, I don't think it's "we haven't noticed any problems" is a good argument in the face of "several experienced developers got it wrong when trying to guess what it did".
What's reverse index order? In the case of statsmodels, we do care about the stacking order. When we use reshape(..., order='F') or ravel('F'), it's only because we want to have a specific array (not memory) layout (and/or because the raveled array came from R) (aside: 2 cases - for 2d parameter vectors, we ravel and reshape often, and we changed our convention to Fortran order, (parameter in rows, equations in columns, IIRC) The interpretation of the results depends on which way we ravel or reshape. - for panel data (time versus individuals), we need to build matching kronecker product arrays which are block-diagonal if the stacking/``order`` is the right way. None of the cases cares about memory layout, it's just: Do we stack by columns or by rows, i.e. fortran- or c-order? Do we want this in rows or in columns? )
Even if N, Z were clearer in this case (which I don't think it is and which I have no idea what it should stand for), you would have to go for every use of ``order`` in numpy to check whether it should be N or F or Z or C, and then users would have to check which order name convention is used in a specific function.
Right - and this would be silly if and only if it made sense to conflate memory layout and index ordering.
I see the two things, but never saw it as a problem arr2 = np.asarray(arr1, order='F') give me an array with Fortran memory layout, I need it (never used in statsmodels, there might be a few places where we used other ways to control the memory layout, but not much.) arr2 = arr1.reshape(-1, 5, order='F') unstack this array by columns, I want 5 of them arr1 = arr2.ravel('F') go back, stack them again by columns (used quite a bit as described before) Cheers, Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: > On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > wrote: >> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>> wrote: >>>> >>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>> ordering. >>>> >>>> This is very confusing. We think the index ordering and memory >>>> ordering ideas need to be separated, and specifically, we should >>>> avoid >>>> using "C" and "F" to refer to index ordering. >>>> >>>> Proposal >>>> ------------- >>>> >>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> index ordering for ravel, reshape >>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>> in >>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> naming idea by Paul Ivanov) >>>> >>>> What do y'all think? >>> >>> I always thought "F" and "C" are easy to understand, I always thought >>> about >>> the content and never about the memory when using it. > > changing the names doesn't make it easier to understand. > I think the confusion is because the new A and K refer to existing > memory > I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking. I don't think I ever get confused about reshape "F" in 2d. But when I work with 3d or larger ndim nd-arrays, I always have to try an example to check my intuition (in general not just reshape).
ravel F and C have *nothing* to do with memory layout.
We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy.
I guess that wasn't so helpful. (emphasis on *target*, There are very few places where an order keyword refers to *existing* memory layout.
It is helpful because it shows how easy it is to get confused between memory order and index order.
What's reverse index order?
I am not being clear, sorry about that: import numpy as np def ravel_iter_last_fastest(arr): res = [] for i in range(arr.shape[0]): for j in range(arr.shape[1]): for k in range(arr.shape[2]): # Iterating over last dimension fastest res.append(arr[i, j, k]) return np.array(res) def ravel_iter_first_fastest(arr): res = [] for k in range(arr.shape[2]): for j in range(arr.shape[1]): for i in range(arr.shape[0]): # Iterating over first dimension fastest res.append(arr[i, j, k]) return np.array(res) a = np.arange(24).reshape((2, 3, 4)) print np.all(a.ravel('C') == ravel_iter_last_fastest(a)) print np.all(a.ravel('F') == ravel_iter_first_fastest(a)) By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above. I guess one could argue that this was not 'reverse' but 'forward' index ordering, but I am not arguing about which is better, or those names, only that it's the order of indices that differs, not the memory layout, and that these ideas need to be kept separate. Cheers, Matthew
On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >> wrote: >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>> wrote: >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> ordering. >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> ordering ideas need to be separated, and specifically, we should >>>>> avoid >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> Proposal >>>>> ------------- >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> index ordering for ravel, reshape >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> in >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> naming idea by Paul Ivanov) >>>>> >>>>> What do y'all think? >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>> about >>>> the content and never about the memory when using it. >> >> changing the names doesn't make it easier to understand. >> I think the confusion is because the new A and K refer to existing >> memory >> I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking. I don't think I ever get confused about reshape "F" in 2d. But when I work with 3d or larger ndim nd-arrays, I always have to try an example to check my intuition (in general not just reshape).
ravel F and C have *nothing* to do with memory layout.
We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy.
I guess that wasn't so helpful. (emphasis on *target*, There are very few places where an order keyword refers to *existing* memory layout.
It is helpful because it shows how easy it is to get confused between memory order and index order.
What's reverse index order?
I am not being clear, sorry about that:
import numpy as np
def ravel_iter_last_fastest(arr): res = [] for i in range(arr.shape[0]): for j in range(arr.shape[1]): for k in range(arr.shape[2]): # Iterating over last dimension fastest res.append(arr[i, j, k]) return np.array(res)
def ravel_iter_first_fastest(arr): res = [] for k in range(arr.shape[2]): for j in range(arr.shape[1]): for i in range(arr.shape[0]): # Iterating over first dimension fastest res.append(arr[i, j, k]) return np.array(res)
good example that's just C and F order in the terminology of numpy http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-ite... (independent of memory) http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#nump... I don't think we want to rename a large part of the basic terminology of numpy Josef
a = np.arange(24).reshape((2, 3, 4))
print np.all(a.ravel('C') == ravel_iter_last_fastest(a)) print np.all(a.ravel('F') == ravel_iter_first_fastest(a))
By 'reverse index ordering' I mean 'ravel_iter_last_fastest' above. I guess one could argue that this was not 'reverse' but 'forward' index ordering, but I am not arguing about which is better, or those names, only that it's the order of indices that differs, not the memory layout, and that these ideas need to be kept separate.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 9:05 PM,
On Sat, Mar 30, 2013 at 11:43 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: > > On Sat, Mar 30, 2013 at 2:20 PM, wrote: > > On Sat, Mar 30, 2013 at 4:57 PM, wrote: > >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > >> wrote: > >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: > >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett > >>>> wrote: > >>>>> > >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index > >>>>> ordering. > >>>>> > >>>>> This is very confusing. We think the index ordering and memory > >>>>> ordering ideas need to be separated, and specifically, we should > >>>>> avoid > >>>>> using "C" and "F" to refer to index ordering. > >>>>> > >>>>> Proposal > >>>>> ------------- > >>>>> > >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards > >>>>> index ordering for ravel, reshape > >>>>> * Prefer "Z" and "N", being graphical representations of unraveling > >>>>> in > >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent > >>>>> naming idea by Paul Ivanov) > >>>>> > >>>>> What do y'all think? > >>>> > >>>> I always thought "F" and "C" are easy to understand, I always thought > >>>> about > >>>> the content and never about the memory when using it. > >> > >> changing the names doesn't make it easier to understand. > >> I think the confusion is because the new A and K refer to existing > >> memory > >> > > I disagree, I think it's confusing, but I have evidence, and that is > that four out of four of us tested ourselves and got it wrong. > > Perhaps we are particularly dumb or poorly informed, but I think it's > rash to assert there is no problem here. I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking. I don't think I ever get confused about reshape "F" in 2d. But when I work with 3d or larger ndim nd-arrays, I always have to try an example to check my intuition (in general not just reshape).
ravel F and C have *nothing* to do with memory layout.
We do agree on this of course - but you said in an earlier mail that you thought of 'C" and 'F' as referring to target memory layout (which they don't in this case) so I think we also agree that "C" and "F" do often refer to memory layout elsewhere in numpy.
I guess that wasn't so helpful. (emphasis on *target*, There are very few places where an order keyword refers to *existing* memory layout.
It is helpful because it shows how easy it is to get confused between memory order and index order.
What's reverse index order?
I am not being clear, sorry about that:
import numpy as np
def ravel_iter_last_fastest(arr): res = [] for i in range(arr.shape[0]): for j in range(arr.shape[1]): for k in range(arr.shape[2]): # Iterating over last dimension fastest res.append(arr[i, j, k]) return np.array(res)
def ravel_iter_first_fastest(arr): res = [] for k in range(arr.shape[2]): for j in range(arr.shape[1]): for i in range(arr.shape[0]): # Iterating over first dimension fastest res.append(arr[i, j, k]) return np.array(res)
good example
that's just C and F order in the terminology of numpy http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#controlling-ite... (independent of memory) http://docs.scipy.org/doc/numpy/reference/generated/numpy.flatiter.html#nump...
I don't think we want to rename a large part of the basic terminology of numpy
Sometimes two ideas get conflated together, and it seems natural to keep together, until people get confused, and you realize that there are two separate ideas. For example here's a quote from the 'flatiter' doc : Iteration is done in C-contiguous style Now - that seems really ugly to me. For example, 'contiguous' should not be in that sentence, although it's easy to see why it is, and it seems to me to be a sign of the confusion between the ideas. Cheers, Matthew
Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: On Sat, Mar 30, 2013 at 4:57 PM,
wrote: > On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > wrote: >> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>> wrote: >>>> >>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>> ordering. >>>> >>>> This is very confusing. We think the index ordering and memory >>>> ordering ideas need to be separated, and specifically, we should >>>> avoid >>>> using "C" and "F" to refer to index ordering. >>>> >>>> Proposal >>>> ------------- >>>> >>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> index ordering for ravel, reshape >>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>> in >>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> naming idea by Paul Ivanov) >>>> >>>> What do y'all think? >>> >>> I always thought "F" and "C" are easy to understand, I always thought >>> about >>> the content and never about the memory when using it. > > changing the names doesn't make it easier to understand. > I think the confusion is because the new A and K refer to existing > memory > I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine. A student asked what he would get back from raveling this array, a concatenated time series, or something spatial? We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :]. He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'. Ironically, this was a Fortran-ordered array in memory, and he was wrong. So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally. I would like, as a teacher, to be able to say something like: This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering. My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'. Cheers, Matthew
On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: On Sat, Mar 30, 2013 at 2:20 PM,
wrote: > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >> wrote: >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>> wrote: >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> ordering. >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> ordering ideas need to be separated, and specifically, we should >>>>> avoid >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> Proposal >>>>> ------------- >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> index ordering for ravel, reshape >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> in >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> naming idea by Paul Ivanov) >>>>> >>>>> What do y'all think? >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>> about >>>> the content and never about the memory when using it. >> >> changing the names doesn't make it easier to understand. >> I think the confusion is because the new A and K refer to existing >> memory >> I disagree, I think it's confusing, but I have evidence, and that is that four out of four of us tested ourselves and got it wrong.
Perhaps we are particularly dumb or poorly informed, but I think it's rash to assert there is no problem here.
I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D): order=C: stack the last dimension, N, time series of one 3d pixels, then stack the time series of the next pixel... process pixels by depth and the row by row (like old TVs) I assume you did this because your underlying array is C contiguous. so your ravel('C') is a c-contiguous view (instead of some weird strides or a copy) I usually prefer time in the first dimension, and stack order=F, then I can start at the front, stack all time periods of the first pixel, keep going and work pixels down the columns, first page, next page, ... (and I hope I have a F-contiguous array, so my raveled array is also F-contiguous.) (note: I'm bringing memory back in as optimization, but not to predict the stacking) Josef (I think brains are designed for Fortran order and C-ordering in numpy is a accident, except, reading a Western language book is neither)
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett
wrote: > > On Sat, Mar 30, 2013 at 2:20 PM, wrote: > > On Sat, Mar 30, 2013 at 4:57 PM, wrote: > >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett > >> wrote: > >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: > >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett > >>>> wrote: > >>>>> > >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index > >>>>> ordering. > >>>>> > >>>>> This is very confusing. We think the index ordering and memory > >>>>> ordering ideas need to be separated, and specifically, we should > >>>>> avoid > >>>>> using "C" and "F" to refer to index ordering. > >>>>> > >>>>> Proposal > >>>>> ------------- > >>>>> > >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards > >>>>> index ordering for ravel, reshape > >>>>> * Prefer "Z" and "N", being graphical representations of unraveling > >>>>> in > >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent > >>>>> naming idea by Paul Ivanov) > >>>>> > >>>>> What do y'all think? > >>>> > >>>> I always thought "F" and "C" are easy to understand, I always thought > >>>> about > >>>> the content and never about the memory when using it. > >> > >> changing the names doesn't make it easier to understand. > >> I think the confusion is because the new A and K refer to existing > >> memory > >> > > I disagree, I think it's confusing, but I have evidence, and that is > that four out of four of us tested ourselves and got it wrong. > > Perhaps we are particularly dumb or poorly informed, but I think it's > rash to assert there is no problem here. I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
order=C: stack the last dimension, N, time series of one 3d pixels, then stack the time series of the next pixel... process pixels by depth and the row by row (like old TVs)
I assume you did this because your underlying array is C contiguous. so your ravel('C') is a c-contiguous view (instead of some weird strides or a copy)
Sorry - what do you mean by 'this' in 'did this'? Reshape? Why would it matter what my underlying array memory layout was?
I usually prefer time in the first dimension, and stack order=F, then I can start at the front, stack all time periods of the first pixel, keep going and work pixels down the columns, first page, next page, ... (and I hope I have a F-contiguous array, so my raveled array is also F-contiguous.)
(note: I'm bringing memory back in as optimization, but not to predict the stacking)
Josef (I think brains are designed for Fortran order and C-ordering in numpy is a accident, except, reading a Western language book is neither)
Yes, I find first axis fastest changing easier to think about, and I came from MATLAB (about 8 years ago mind), so that also made it more natural. I had (until yesterday) simply assumed that numpy unraveled that way, because it seemed more obvious to me, and knew that the unravel index order need have nothing to do with the memory order, or the fact that arrays are C contiguous by default. Not so of course. That's not my complaint as you know - it's just a convention, I guessed the convention wrong. Cheers, Matthew
On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
wrote: > On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett > wrote: >> >> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >> >> wrote: >> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >> >>>> wrote: >> >>>>> >> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >> >>>>> ordering. >> >>>>> >> >>>>> This is very confusing. We think the index ordering and memory >> >>>>> ordering ideas need to be separated, and specifically, we should >> >>>>> avoid >> >>>>> using "C" and "F" to refer to index ordering. >> >>>>> >> >>>>> Proposal >> >>>>> ------------- >> >>>>> >> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >> >>>>> index ordering for ravel, reshape >> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >> >>>>> in >> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >> >>>>> naming idea by Paul Ivanov) >> >>>>> >> >>>>> What do y'all think? >> >>>> >> >>>> I always thought "F" and "C" are easy to understand, I always thought >> >>>> about >> >>>> the content and never about the memory when using it. >> >> >> >> changing the names doesn't make it easier to understand. >> >> I think the confusion is because the new A and K refer to existing >> >> memory >> >> >> >> I disagree, I think it's confusing, but I have evidence, and that is >> that four out of four of us tested ourselves and got it wrong. >> >> Perhaps we are particularly dumb or poorly informed, but I think it's >> rash to assert there is no problem here. I think you are overcomplicating things or phrased it as a "trick question"
I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts. All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
order=C: stack the last dimension, N, time series of one 3d pixels, then stack the time series of the next pixel... process pixels by depth and the row by row (like old TVs)
I assume you did this because your underlying array is C contiguous. so your ravel('C') is a c-contiguous view (instead of some weird strides or a copy)
Sorry - what do you mean by 'this' in 'did this'? Reshape? Why would it matter what my underlying array memory layout was?
`this` was use ravel('C') and have time series as last index. Because if we have a few gigabytes of video recordings, we better match the ravel order with the memory order. I thought you picked time N in the last axis, so you can have fast access to time series (assuming you didn't specify F-contiguous). (it's not confusing: we have two orders, index/iterator and memory, and to get a nice view, the two should match) rereading: since you had F-ordered memory, ravel('F') gives the nice view (a picture at a time instead of a timeseries at a time)
I usually prefer time in the first dimension, and stack order=F, then I can start at the front, stack all time periods of the first pixel, keep going and work pixels down the columns, first page, next page, ... (and I hope I have a F-contiguous array, so my raveled array is also F-contiguous.)
(note: I'm bringing memory back in as optimization, but not to predict the stacking)
Josef (I think brains are designed for Fortran order and C-ordering in numpy is a accident, except, reading a Western language book is neither)
Yes, I find first axis fastest changing easier to think about, and I came from MATLAB (about 8 years ago mind), so that also made it more natural.
I had (until yesterday) simply assumed that numpy unraveled that way, because it seemed more obvious to me, and knew that the unravel index order need have nothing to do with the memory order, or the fact that arrays are C contiguous by default. Not so of course. That's not my complaint as you know - it's just a convention, I guessed the convention wrong.
Numpy was written by C developers, and one of the first things I learned about numpy is the ``order``: Default is always C (except for linalg) and axis=None (except in scipy.stats), and dimensions disappear in reduce Cheers, Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:50 PM,
wrote: > On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle > wrote: >> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >> wrote: >>> >>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>> >> wrote: >>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>> >>>> wrote: >>> >>>>> >>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>> >>>>> ordering. >>> >>>>> >>> >>>>> This is very confusing. We think the index ordering and memory >>> >>>>> ordering ideas need to be separated, and specifically, we should >>> >>>>> avoid >>> >>>>> using "C" and "F" to refer to index ordering. >>> >>>>> >>> >>>>> Proposal >>> >>>>> ------------- >>> >>>>> >>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>> >>>>> index ordering for ravel, reshape >>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>> >>>>> in >>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>> >>>>> naming idea by Paul Ivanov) >>> >>>>> >>> >>>>> What do y'all think? >>> >>>> >>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>> >>>> about >>> >>>> the content and never about the memory when using it. >>> >> >>> >> changing the names doesn't make it easier to understand. >>> >> I think the confusion is because the new A and K refer to existing >>> >> memory >>> >> >>> >>> I disagree, I think it's confusing, but I have evidence, and that is >>> that four out of four of us tested ourselves and got it wrong. >>> >>> Perhaps we are particularly dumb or poorly informed, but I think it's >>> rash to assert there is no problem here. > > I think you are overcomplicating things or phrased it as a "trick question" I don't know what you mean by trick question - was there something over-complicated in the example? I deliberately didn't include various much more confusing examples in "reshape".
I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy? You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout. As evidence: * My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often. * The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout. * The current docstring of 'reshape' cannot be explained without referring to memory order. Cheers, Matthew
On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett
wrote: > Hi, > > On Sat, Mar 30, 2013 at 7:50 PM, wrote: >> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >> wrote: >>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>> wrote: >>>> >>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>> >> wrote: >>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>> >>>> wrote: >>>> >>>>> >>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>> >>>>> ordering. >>>> >>>>> >>>> >>>>> This is very confusing. We think the index ordering and memory >>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>> >>>>> avoid >>>> >>>>> using "C" and "F" to refer to index ordering. >>>> >>>>> >>>> >>>>> Proposal >>>> >>>>> ------------- >>>> >>>>> >>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>> >>>>> index ordering for ravel, reshape >>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>> >>>>> in >>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>> >>>>> naming idea by Paul Ivanov) >>>> >>>>> >>>> >>>>> What do y'all think? >>>> >>>> >>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>> >>>> about >>>> >>>> the content and never about the memory when using it. >>>> >> >>>> >> changing the names doesn't make it easier to understand. >>>> >> I think the confusion is because the new A and K refer to existing >>>> >> memory >>>> >> >>>> >>>> I disagree, I think it's confusing, but I have evidence, and that is >>>> that four out of four of us tested ourselves and got it wrong. >>>> >>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>> rash to assert there is no problem here. >> >> I think you are overcomplicating things or phrased it as a "trick question" > > I don't know what you mean by trick question - was there something > over-complicated in the example? I deliberately didn't include > various much more confusing examples in "reshape". I meant making the "candidates" think about memory instead of just column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections. basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ... advanced usage: memory layout and some ability to predict when you get a view and when you get a copy. And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C" I don't think I can express my preference for reshape order="F" any better than I did, so maybe it's time for some additional users/developers to chime in. Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sun, Mar 31, 2013 at 10:43 PM,
Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett < matthew.brett@gmail.com> wrote:
Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett < matthew.brett@gmail.com> wrote:
Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: > On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett < matthew.brett@gmail.com> wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>> wrote: >>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett < matthew.brett@gmail.com> >>>> wrote: >>>>> >>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>> >> wrote: >>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> >>>>> ordering. >>>>> >>>>> >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>> >>>>> avoid >>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> >>>>> >>>>> Proposal >>>>> >>>>> ------------- >>>>> >>>>> >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> >>>>> index ordering for ravel, reshape >>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> >>>>> in >>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> >>>>> naming idea by Paul Ivanov) >>>>> >>>>> >>>>> >>>>> What do y'all think? >>>>> >>>> >>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>> >>>> about >>>>> >>>> the content and never about the memory when using it. >>>>> >> >>>>> >> changing the names doesn't make it easier to understand. >>>>> >> I think the confusion is because the new A and K refer to existing >>>>> >> memory >>>>> >> >>>>> >>>>> I disagree, I think it's confusing, but I have evidence, and >>>>> that four out of four of us tested ourselves and got it wrong. >>>>> >>>>> Perhaps we are particularly dumb or poorly informed, but I
>>>>> rash to assert there is no problem here. >>> >>> I think you are overcomplicating things or phrased it as a "trick question" >> >> I don't know what you mean by trick question - was there something >> over-complicated in the example? I deliberately didn't include >> various much more confusing examples in "reshape". > > I meant making the "candidates" think about memory instead of just > column versus row stacking.
To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because
On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: that is think it's the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking
about it
* Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
I don't think I can express my preference for reshape order="F" any better than I did, so maybe it's time for some additional users/developers to chime in.
My 2cents: while I can't go back and un-read earlier emails in this thread, I don't see what's ambiguous in the case of ravel. For reshape I can see though that it's possible to interpret it in two ways. In such cases I open up IPython and play with a 2x3 array to check my understanding. That's OK, and certainly better than adding duplicate names now for C/F even if that would solve the issue (which it probably wouldn't). Therefore I'm -1 on the initial proposal. Ralf
Hi,
On Sun, Mar 31, 2013 at 1:43 PM,
On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 7:02 PM,
wrote: > On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>> wrote: >>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>> wrote: >>>>> >>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>> >> wrote: >>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>> >>>> wrote: >>>>> >>>>> >>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>> >>>>> ordering. >>>>> >>>>> >>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>> >>>>> avoid >>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>> >>>>> >>>>> >>>>> Proposal >>>>> >>>>> ------------- >>>>> >>>>> >>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>> >>>>> index ordering for ravel, reshape >>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>> >>>>> in >>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>> >>>>> naming idea by Paul Ivanov) >>>>> >>>>> >>>>> >>>>> What do y'all think? >>>>> >>>> >>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>> >>>> about >>>>> >>>> the content and never about the memory when using it. >>>>> >> >>>>> >> changing the names doesn't make it easier to understand. >>>>> >> I think the confusion is because the new A and K refer to existing >>>>> >> memory >>>>> >> >>>>> >>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>> that four out of four of us tested ourselves and got it wrong. >>>>> >>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>> rash to assert there is no problem here. >>> >>> I think you are overcomplicating things or phrased it as a "trick question" >> >> I don't know what you mean by trick question - was there something >> over-complicated in the example? I deliberately didn't include >> various much more confusing examples in "reshape". > > I meant making the "candidates" think about memory instead of just > column versus row stacking. To be specific, we were teaching about reshaping a (I, J, K, N) 4D array, it was an image, with time as the 4th dimension (N time points). Raveling and reshaping 3D and 4D arrays is a common thing to do in neuroimaging, as you can imagine.
A student asked what he would get back from raveling this array, a concatenated time series, or something spatial?
We showed (I'd worked it out by this time) that the first N values were the time series given by [0, 0, 0, :].
He said - "Oh - I see - so the data is stored as a whole lot of time series one by one, I thought it would be stored as a series of images'.
Ironically, this was a Fortran-ordered array in memory, and he was wrong.
So, I think the idea of memory ordering and index ordering is very easy to confuse, and comes up naturally.
I would like, as a teacher, to be able to say something like:
This is what C memory layout is (it's the memory layout that gives arr.flags.C_CONTIGUOUS=True) This is what F memory layout is (it's the memory layout that gives arr.flags.F_CONTIGUOUS=True) It's rather easy to get something that is neither C or F memory layout Numpy does many memory layouts. Ravel and reshape and numpy in general do not care (normally) about C or F layouts, they only care about index ordering.
My point, that I'm repeating, is that my job is made harder by 'arr.ravel('F')'.
But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it? What evidence would you give that it was the best way to teach it?
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it.
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
Here's the docstring for 'reshape': order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved. The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation. Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case? Here's the docstring for 'ravel': order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used. Cheers, Matthew
On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
Hi,
On Sun, Mar 31, 2013 at 1:43 PM,
wrote: On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett
wrote: > Hi, > > On Sat, Mar 30, 2013 at 7:02 PM, wrote: >> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>> wrote: >>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>> wrote: >>>>>> >>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>> >> wrote: >>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>> >>>> wrote: >>>>>> >>>>> >>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>> >>>>> ordering. >>>>>> >>>>> >>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>> >>>>> avoid >>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>> >>>>> >>>>>> >>>>> Proposal >>>>>> >>>>> ------------- >>>>>> >>>>> >>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>> >>>>> index ordering for ravel, reshape >>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>> >>>>> in >>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>> >>>>> naming idea by Paul Ivanov) >>>>>> >>>>> >>>>>> >>>>> What do y'all think? >>>>>> >>>> >>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>> >>>> about >>>>>> >>>> the content and never about the memory when using it. >>>>>> >> >>>>>> >> changing the names doesn't make it easier to understand. >>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>> >> memory >>>>>> >> >>>>>> >>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>> >>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>> rash to assert there is no problem here. >>>> >>>> I think you are overcomplicating things or phrased it as a "trick question" >>> >>> I don't know what you mean by trick question - was there something >>> over-complicated in the example? I deliberately didn't include >>> various much more confusing examples in "reshape". >> >> I meant making the "candidates" think about memory instead of just >> column versus row stacking. > > To be specific, we were teaching about reshaping a (I, J, K, N) 4D > array, it was an image, with time as the 4th dimension (N time > points). Raveling and reshaping 3D and 4D arrays is a common thing > to do in neuroimaging, as you can imagine. > > A student asked what he would get back from raveling this array, a > concatenated time series, or something spatial? > > We showed (I'd worked it out by this time) that the first N values > were the time series given by [0, 0, 0, :]. > > He said - "Oh - I see - so the data is stored as a whole lot of time > series one by one, I thought it would be stored as a series of > images'. > > Ironically, this was a Fortran-ordered array in memory, and he was wrong. > > So, I think the idea of memory ordering and index ordering is very > easy to confuse, and comes up naturally. > > I would like, as a teacher, to be able to say something like: > > This is what C memory layout is (it's the memory layout that gives > arr.flags.C_CONTIGUOUS=True) > This is what F memory layout is (it's the memory layout that gives > arr.flags.F_CONTIGUOUS=True) > It's rather easy to get something that is neither C or F memory layout > Numpy does many memory layouts. > Ravel and reshape and numpy in general do not care (normally) about C > or F layouts, they only care about index ordering. > > My point, that I'm repeating, is that my job is made harder by > 'arr.ravel('F')'. But once you know that ravel and reshape don't care about memory, the ravel is easy to predict (maybe not easy to visualize in 4-D):
But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it?
What evidence would you give that it was the best way to teach it?
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it.
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
Here's the docstring for 'reshape':
order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved.
The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation.
Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case?
The 'A' means C-order unless `ndarray.flags.fnc == True` (which means "fortran not C"). The detail about "not C" should not matter really for copies, for reshape it should maybe be mentioned more clearly. Though honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever does it. As for ravel... you can probably just as well use 'K' instead which is even less restrictive. - Sebastian
Here's the docstring for 'ravel':
order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Mon, Apr 1, 2013 at 10:23 AM, Sebastian Berg
On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
Hi,
On Sun, Mar 31, 2013 at 1:43 PM,
wrote: On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 9:37 PM,
wrote: > On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>> wrote: >>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>> wrote: >>>>>>> >>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>> >> wrote: >>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>> >>>> wrote: >>>>>>> >>>>> >>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>> >>>>> ordering. >>>>>>> >>>>> >>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>> >>>>> avoid >>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>> >>>>> >>>>>>> >>>>> Proposal >>>>>>> >>>>> ------------- >>>>>>> >>>>> >>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>> >>>>> index ordering for ravel, reshape >>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>> >>>>> in >>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>> >>>>> >>>>>>> >>>>> What do y'all think? >>>>>>> >>>> >>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>> >>>> about >>>>>>> >>>> the content and never about the memory when using it. >>>>>>> >> >>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>> >> memory >>>>>>> >> >>>>>>> >>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>> >>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>> rash to assert there is no problem here. >>>>> >>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>> >>>> I don't know what you mean by trick question - was there something >>>> over-complicated in the example? I deliberately didn't include >>>> various much more confusing examples in "reshape". >>> >>> I meant making the "candidates" think about memory instead of just >>> column versus row stacking. >> >> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >> array, it was an image, with time as the 4th dimension (N time >> points). Raveling and reshaping 3D and 4D arrays is a common thing >> to do in neuroimaging, as you can imagine. >> >> A student asked what he would get back from raveling this array, a >> concatenated time series, or something spatial? >> >> We showed (I'd worked it out by this time) that the first N values >> were the time series given by [0, 0, 0, :]. >> >> He said - "Oh - I see - so the data is stored as a whole lot of time >> series one by one, I thought it would be stored as a series of >> images'. >> >> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >> >> So, I think the idea of memory ordering and index ordering is very >> easy to confuse, and comes up naturally. >> >> I would like, as a teacher, to be able to say something like: >> >> This is what C memory layout is (it's the memory layout that gives >> arr.flags.C_CONTIGUOUS=True) >> This is what F memory layout is (it's the memory layout that gives >> arr.flags.F_CONTIGUOUS=True) >> It's rather easy to get something that is neither C or F memory layout >> Numpy does many memory layouts. >> Ravel and reshape and numpy in general do not care (normally) about C >> or F layouts, they only care about index ordering. >> >> My point, that I'm repeating, is that my job is made harder by >> 'arr.ravel('F')'. > > But once you know that ravel and reshape don't care about memory, the > ravel is easy to predict (maybe not easy to visualize in 4-D): But this assumes that you already know that there's such a thing as memory layout, and there's such a thing as index ordering, and that 'C' and 'F' in ravel refer to index ordering. Once you have that, you're golden. I'm arguing it's markedly harder to get this distinction, and keep it in mind, and teach it, if we are using the 'C' and 'F" names for both things.
No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it?
What evidence would you give that it was the best way to teach it?
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it.
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
Here's the docstring for 'reshape':
order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved.
The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation.
Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case?
The 'A' means C-order unless `ndarray.flags.fnc == True` (which means "fortran not C"). The detail about "not C" should not matter really for copies, for reshape it should maybe be mentioned more clearly. Though honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever does it. As for ravel... you can probably just as well use 'K' instead which is even less restrictive.
I was arguing that it is not possible to explain the docstring(s) without reference to memory order - I guess you agree. Cheers, Matthew
On Mon, Apr 1, 2013 at 3:10 PM, Matthew Brett
Hi,
On Mon, Apr 1, 2013 at 10:23 AM, Sebastian Berg
wrote: On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
Hi,
On Sun, Mar 31, 2013 at 1:43 PM,
wrote: On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett
wrote: > Hi, > > On Sat, Mar 30, 2013 at 9:37 PM, wrote: >> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>> wrote: >>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>> wrote: >>>>>>>> >>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>> >> wrote: >>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>> >>>> wrote: >>>>>>>> >>>>> >>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>> >>>>> ordering. >>>>>>>> >>>>> >>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>> >>>>> avoid >>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>> >>>>> >>>>>>>> >>>>> Proposal >>>>>>>> >>>>> ------------- >>>>>>>> >>>>> >>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>> >>>>> in >>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>> >>>>> >>>>>>>> >>>>> What do y'all think? >>>>>>>> >>>> >>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>> >>>> about >>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>> >> >>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>> >> memory >>>>>>>> >> >>>>>>>> >>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>> >>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>> rash to assert there is no problem here. >>>>>> >>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>> >>>>> I don't know what you mean by trick question - was there something >>>>> over-complicated in the example? I deliberately didn't include >>>>> various much more confusing examples in "reshape". >>>> >>>> I meant making the "candidates" think about memory instead of just >>>> column versus row stacking. >>> >>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>> array, it was an image, with time as the 4th dimension (N time >>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>> to do in neuroimaging, as you can imagine. >>> >>> A student asked what he would get back from raveling this array, a >>> concatenated time series, or something spatial? >>> >>> We showed (I'd worked it out by this time) that the first N values >>> were the time series given by [0, 0, 0, :]. >>> >>> He said - "Oh - I see - so the data is stored as a whole lot of time >>> series one by one, I thought it would be stored as a series of >>> images'. >>> >>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>> >>> So, I think the idea of memory ordering and index ordering is very >>> easy to confuse, and comes up naturally. >>> >>> I would like, as a teacher, to be able to say something like: >>> >>> This is what C memory layout is (it's the memory layout that gives >>> arr.flags.C_CONTIGUOUS=True) >>> This is what F memory layout is (it's the memory layout that gives >>> arr.flags.F_CONTIGUOUS=True) >>> It's rather easy to get something that is neither C or F memory layout >>> Numpy does many memory layouts. >>> Ravel and reshape and numpy in general do not care (normally) about C >>> or F layouts, they only care about index ordering. >>> >>> My point, that I'm repeating, is that my job is made harder by >>> 'arr.ravel('F')'. >> >> But once you know that ravel and reshape don't care about memory, the >> ravel is easy to predict (maybe not easy to visualize in 4-D): > > But this assumes that you already know that there's such a thing as > memory layout, and there's such a thing as index ordering, and that > 'C' and 'F' in ravel refer to index ordering. Once you have that, > you're golden. I'm arguing it's markedly harder to get this > distinction, and keep it in mind, and teach it, if we are using the > 'C' and 'F" names for both things. No, I think you are still missing my point. I think explaining ravel and reshape F and C is easy (kind of) because the students don't need to know at that stage about memory layouts.
All they need to know is that we look at n-dimensional objects in C-order or in F-order (whichever index runs fastest)
Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it?
What evidence would you give that it was the best way to teach it?
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it.
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
Here's the docstring for 'reshape':
order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved.
The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation.
Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case?
The 'A' means C-order unless `ndarray.flags.fnc == True` (which means "fortran not C"). The detail about "not C" should not matter really for copies, for reshape it should maybe be mentioned more clearly. Though honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever does it. As for ravel... you can probably just as well use 'K' instead which is even less restrictive.
I was arguing that it is not possible to explain the docstring(s) without reference to memory order - I guess you agree.
I was carefully to always refer to "C" and "F" options.
I've never seen a usage of "A", nor the "K" in ravel ("K" is not
available in numpy 1.5)
and I don't expect to run into a case where I need "A" or "K".
My impression is that both "A" and "K" are only good for memory
optimization, when we do *not* care (much) about the actual sequence.
(So, in my opinion, it's mostly useless to try to figure out what the
sequence is.)
So, I would categorize a question for predicting what happens with "A" or "K"
as a question to separate developers in the style of,
Do you really understand the tricky parts of numpy? or
Do you just have a working knowledge of numpy?
(I just avoid certain parts of numpy because they make my head spin.
e.g. mixing slices and fancy indexing in more than 2d ?)
I'm just against taking away the easy to understand and frequently used
(names) "F" and "C", to come back to the original question
Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Mon, Apr 1, 2013 at 1:34 PM,
On Mon, Apr 1, 2013 at 3:10 PM, Matthew Brett
wrote: Hi,
On Mon, Apr 1, 2013 at 10:23 AM, Sebastian Berg
wrote: On Sun, 2013-03-31 at 14:04 -0700, Matthew Brett wrote:
Hi,
On Sun, Mar 31, 2013 at 1:43 PM,
wrote: On Sun, Mar 31, 2013 at 3:54 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 10:38 PM,
wrote: > On Sun, Mar 31, 2013 at 12:50 AM, Matthew Brett wrote: >> Hi, >> >> On Sat, Mar 30, 2013 at 9:37 PM, wrote: >>> On Sun, Mar 31, 2013 at 12:04 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Mar 30, 2013 at 7:02 PM, wrote: >>>>> On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett wrote: >>>>>> Hi, >>>>>> >>>>>> On Sat, Mar 30, 2013 at 7:50 PM, wrote: >>>>>>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle >>>>>>> wrote: >>>>>>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Sat, Mar 30, 2013 at 2:20 PM, wrote: >>>>>>>>> > On Sat, Mar 30, 2013 at 4:57 PM, wrote: >>>>>>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett >>>>>>>>> >> wrote: >>>>>>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM, wrote: >>>>>>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett >>>>>>>>> >>>> wrote: >>>>>>>>> >>>>> >>>>>>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index >>>>>>>>> >>>>> ordering. >>>>>>>>> >>>>> >>>>>>>>> >>>>> This is very confusing. We think the index ordering and memory >>>>>>>>> >>>>> ordering ideas need to be separated, and specifically, we should >>>>>>>>> >>>>> avoid >>>>>>>>> >>>>> using "C" and "F" to refer to index ordering. >>>>>>>>> >>>>> >>>>>>>>> >>>>> Proposal >>>>>>>>> >>>>> ------------- >>>>>>>>> >>>>> >>>>>>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards >>>>>>>>> >>>>> index ordering for ravel, reshape >>>>>>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling >>>>>>>>> >>>>> in >>>>>>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent >>>>>>>>> >>>>> naming idea by Paul Ivanov) >>>>>>>>> >>>>> >>>>>>>>> >>>>> What do y'all think? >>>>>>>>> >>>> >>>>>>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought >>>>>>>>> >>>> about >>>>>>>>> >>>> the content and never about the memory when using it. >>>>>>>>> >> >>>>>>>>> >> changing the names doesn't make it easier to understand. >>>>>>>>> >> I think the confusion is because the new A and K refer to existing >>>>>>>>> >> memory >>>>>>>>> >> >>>>>>>>> >>>>>>>>> I disagree, I think it's confusing, but I have evidence, and that is >>>>>>>>> that four out of four of us tested ourselves and got it wrong. >>>>>>>>> >>>>>>>>> Perhaps we are particularly dumb or poorly informed, but I think it's >>>>>>>>> rash to assert there is no problem here. >>>>>>> >>>>>>> I think you are overcomplicating things or phrased it as a "trick question" >>>>>> >>>>>> I don't know what you mean by trick question - was there something >>>>>> over-complicated in the example? I deliberately didn't include >>>>>> various much more confusing examples in "reshape". >>>>> >>>>> I meant making the "candidates" think about memory instead of just >>>>> column versus row stacking. >>>> >>>> To be specific, we were teaching about reshaping a (I, J, K, N) 4D >>>> array, it was an image, with time as the 4th dimension (N time >>>> points). Raveling and reshaping 3D and 4D arrays is a common thing >>>> to do in neuroimaging, as you can imagine. >>>> >>>> A student asked what he would get back from raveling this array, a >>>> concatenated time series, or something spatial? >>>> >>>> We showed (I'd worked it out by this time) that the first N values >>>> were the time series given by [0, 0, 0, :]. >>>> >>>> He said - "Oh - I see - so the data is stored as a whole lot of time >>>> series one by one, I thought it would be stored as a series of >>>> images'. >>>> >>>> Ironically, this was a Fortran-ordered array in memory, and he was wrong. >>>> >>>> So, I think the idea of memory ordering and index ordering is very >>>> easy to confuse, and comes up naturally. >>>> >>>> I would like, as a teacher, to be able to say something like: >>>> >>>> This is what C memory layout is (it's the memory layout that gives >>>> arr.flags.C_CONTIGUOUS=True) >>>> This is what F memory layout is (it's the memory layout that gives >>>> arr.flags.F_CONTIGUOUS=True) >>>> It's rather easy to get something that is neither C or F memory layout >>>> Numpy does many memory layouts. >>>> Ravel and reshape and numpy in general do not care (normally) about C >>>> or F layouts, they only care about index ordering. >>>> >>>> My point, that I'm repeating, is that my job is made harder by >>>> 'arr.ravel('F')'. >>> >>> But once you know that ravel and reshape don't care about memory, the >>> ravel is easy to predict (maybe not easy to visualize in 4-D): >> >> But this assumes that you already know that there's such a thing as >> memory layout, and there's such a thing as index ordering, and that >> 'C' and 'F' in ravel refer to index ordering. Once you have that, >> you're golden. I'm arguing it's markedly harder to get this >> distinction, and keep it in mind, and teach it, if we are using the >> 'C' and 'F" names for both things. > > No, I think you are still missing my point. > I think explaining ravel and reshape F and C is easy (kind of) because the > students don't need to know at that stage about memory layouts. > > All they need to know is that we look at n-dimensional objects in > C-order or in F-order > (whichever index runs fastest) Would you accept that it may or may not be true that it is desirable or practical not to mention memory layouts when teaching numpy?
I think they should be in two different sections.
basic usage: ravel, reshape in pure index order, and indexing, broadcasting, ...
advanced usage: memory layout and some ability to predict when you get a view and when you get a copy.
Right - that is what you think - but I was asking - do you agree that it's possible that that is not best way to teach it?
What evidence would you give that it was the best way to teach it?
And I still think words can mean different things in different context (with a qualifier maybe) indexing in fortran order memory in fortran order
Right - but you'd probably also accept that using the same word for different and related things is likely to cause confusion? I'm sure we could come up with some experimental evidence for that if you do doubt it.
Disclaimer: I never tried to teach numpy and with GSOC students my explanations only went a little bit beyond what they needed to know for the purpose at hand (I hope)
You believe it is desirable, I believe that it is not - that teaching numpy naturally involves some discussion of memory layout.
As evidence:
* My student, without any prompting about memory layouts, is asking about it * Travis' numpy book has a very early section on this (section 2.3 - memory layout) * I often think about memory layouts, and from your discussion, you do too. It's uncommon that you don't have to teach something that experienced users think about often.
I'm mentioning memory layout because I'm talking to you. I wouldn't talk about memory layout if I would try to explain ravel, reshape and indexing for the first time to a student.
* The most common use of 'order' only refers to memory layout. For example np.array "order" doesn't refer to index ordering but to memory layout.
No, as I tried to show with the statsmodels example. I don't require GSOC students (that are relatively new to numpy) to understand much about memory layout. The only use of ``order`` in statsmodels refers to *index* order in ravel and reshape.
* The current docstring of 'reshape' cannot be explained without referring to memory order.
really ? I thought reshape only refers to *index* order for "F" and "C"
Here's the docstring for 'reshape':
order : {'C', 'F', 'A'}, optional Determines whether the array data should be viewed as in C (row-major) order, FORTRAN (column-major) order, or the C/FORTRAN order should be preserved.
The 'A' option cannot be explained without reference to 'C' or 'F' *memory* layout - i.e. a different meaning of the 'C' and 'F" in the indexing interpretation.
Actually, as a matter of interest - how would you explain the behavior of 'A' when the array is neither 'C' or 'F' memory layout? Maybe that could be a good test case?
The 'A' means C-order unless `ndarray.flags.fnc == True` (which means "fortran not C"). The detail about "not C" should not matter really for copies, for reshape it should maybe be mentioned more clearly. Though honestly, reshaping with 'A' seems so weird to me, I doubt anyone ever does it. As for ravel... you can probably just as well use 'K' instead which is even less restrictive.
I was arguing that it is not possible to explain the docstring(s) without reference to memory order - I guess you agree.
I was carefully to always refer to "C" and "F" options.
I've never seen a usage of "A", nor the "K" in ravel ("K" is not available in numpy 1.5) and I don't expect to run into a case where I need "A" or "K".
Right. I am only pointing out that one cannot explain the docstring without reference to memory order.
My impression is that both "A" and "K" are only good for memory optimization, when we do *not* care (much) about the actual sequence. (So, in my opinion, it's mostly useless to try to figure out what the sequence is.)
So, I would categorize a question for predicting what happens with "A" or "K" as a question to separate developers in the style of, Do you really understand the tricky parts of numpy? or Do you just have a working knowledge of numpy?
(I just avoid certain parts of numpy because they make my head spin. e.g. mixing slices and fancy indexing in more than 2d ?)
I'm just against taking away the easy to understand and frequently used (names) "F" and "C", to come back to the original question
I agree 'F' and 'C' are frequently used, but I estimate they are most frequently used with a different meaning. "Easy to understand" is obviously subjective, and not much use for the discussion, hence my attempt to try and find some evidence on the point. 'F' and 'C' are clearly not simple, in a technical sense, because they have two different meanings. The use of C and F are of course familiar, and that gives us a bias to believe they are easy for some someone else to understand. I was hoping for some attempt to get past that bias, which is obviously going to be strong, I believe that evidence on that point is your requirement that someone learning this stuff does not come across 'C' or 'F' in the sense of memory layout, until they are advanced, and my earlier assertion (with some evidence) that that is neither desirable nor practical. Cheers, Matthew
HI folks, I've been teaching Python lately, have taught numpy a couple times (formally), and am preparing a leacture about it over the next couple weeks -- so I'm taking an interest here. I've been a regular numpy user for a long time, though as it happens, rarely use ravel() (sode note, what's always confused me the most is that it seems to me that ravel() _unravels_ the array - but that's a side note...) So I ignored the first post, then fired up iPython, read the docstring, and played with ravel a bit -- it behaved EXACTLY like I expected. -- at least for 2-d.... Mathew, I expect your group may have gotten tied up by the fact that you know too much! kind of like how I have a hard time getting my iphone to work, and my computer-illiterate wife has no problem at all. So: yes, I do think it's bit confusing and unfortunate that the "order" parameter has two somewhat different meanings, but they are in fat, used fairly similarly. And while the idea of "fortran" or "C" ordering of arrays may be a foreign concept to folks that have not used fortran or C (or most critically, tried to interace the two...) it's a common enough concept that it's a reasonable shorthand. As for "should we teach memory order at all to newbies?' I usually do teach memory order early on, partly that's because I really like to emphasize that numpy arrays are both a really nice Python data structure and set of functions, but also a wrapper around a block of data -- for the later, you need to talk about order. Also, even with pure-python, knowing a bit about whether arrays are contiguous or not is important (and views, and...). You can do a lot with numpy without thinking about memory order at all, but to really make it dance, you need to know about it. In short -- I don't think the situation is too bad, and not bad enough to change any names or flags, but if someone wants to add a bit to the ravel docstring to clarify it, I'm all for it. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi,
On Mon, Apr 1, 2013 at 4:51 PM, Chris Barker - NOAA Federal
HI folks,
I've been teaching Python lately, have taught numpy a couple times (formally), and am preparing a leacture about it over the next couple weeks -- so I'm taking an interest here.
I've been a regular numpy user for a long time, though as it happens, rarely use ravel() (sode note, what's always confused me the most is that it seems to me that ravel() _unravels_ the array - but that's a side note...)
So I ignored the first post, then fired up iPython, read the docstring, and played with ravel a bit -- it behaved EXACTLY like I expected. -- at least for 2-d....
Mathew, I expect your group may have gotten tied up by the fact that you know too much! kind of like how I have a hard time getting my iphone to work, and my computer-illiterate wife has no problem at all.
Thank you for the compliment, it's more enjoyable than other potential explanations of my confusion (sigh). But, I don't think that is the explanation. First, there were three of us with different levels of experience getting confused on this. Second, I think we all agree that:
So: yes, I do think it's bit confusing and unfortunate that the "order" parameter has two somewhat different meanings,
- so there is a good reason that we could get confused. Last, as soon as we came to the distinction between index order and memory layout, it was clear. We all agreed that this was an important distinction that would improve numpy if we made it. Before I sent the email I did wonder aloud whether people would read the email, understand the distinction, and then fail to see the problem. It is hard to imagine yourself before you understood something.
but they are in fat, used fairly similarly. And while the idea of "fortran" or "C" ordering of arrays may be a foreign concept to folks that have not used fortran or C (or most critically, tried to interace the two...) it's a common enough concept that it's a reasonable shorthand.
As for "should we teach memory order at all to newbies?'
I usually do teach memory order early on, partly that's because I really like to emphasize that numpy arrays are both a really nice Python data structure and set of functions, but also a wrapper around a block of data -- for the later, you need to talk about order. Also, even with pure-python, knowing a bit about whether arrays are contiguous or not is important (and views, and...). You can do a lot with numpy without thinking about memory order at all, but to really make it dance, you need to know about it.
In short -- I don't think the situation is too bad, and not bad enough to change any names or flags, but if someone wants to add a bit to the ravel docstring to clarify it, I'm all for it.
I think you agree that there is potential for confusion, and there doesn't seem any reason to continue with that confusion if we can come up with a clearer name. So here is a compromise proposal. How about: * Preferring the names 'c-style' and 'f-style' for the indexing order case (ravel, reshape, flatiter) * Leaving 'C" and 'F' as functional shortcuts, so there is no possible backwards-compatibility problem. Would you object to that? Cheers, Matthew
Hi all, Since we're mentionning obvious and non-obvious naming,
I think you agree that there is potential for confusion, and there doesn't seem any reason to continue with that confusion if we can come up with a clearer name.
So here is a compromise proposal.
How about:
* Preferring the names 'c-style' and 'f-style' for the indexing order case (ravel, reshape, flatiter)
This naming scheme is obvious for the ones that have been doing some coding for a long time, but they tend not to speak to anyone else. Why not use naming that are a little bit more explicit (and of course, keep the legacy naming available), and use 'row-first' and 'column-first' (or anything else that may be more explicit) ? Cheers, Éric.
* Leaving 'C" and 'F' as functional shortcuts, so there is no possible backwards-compatibility problem.
Would you object to that?
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Un clavier azerty en vaut deux
Éric Depagne eric@depagne.org
On Mon, Apr 1, 2013 at 10:15 PM, Matthew Brett
Thank you for the compliment, it's more enjoyable than other potential explanations of my confusion (sigh).
But, I don't think that is the explanation.
well, the core explanation is these are difficult and intertwined concepts...And yes, better names and better docs can help.
Last, as soon as we came to the distinction between index order and memory layout, it was clear.
We all agreed that this was an important distinction that would improve numpy if we made it.
yup.
I think you agree that there is potential for confusion, and there doesn't seem any reason to continue with that confusion if we can come up with a clearer name.
well, changing an API is not to be taken lightly -- we are not discussion how we'd do it if we were to start from fresh here. So any change should make things enough better that it is worth dealing with the process of teh change.
So here is a compromise proposal.
* Preferring the names 'c-style' and 'f-style' for the indexing order case (ravel, reshape, flatiter)
* Leaving 'C" and 'F' as functional shortcuts, so there is no possible backwards-compatibility problem.
seems reasonable enough -- though even with the backward compatibility, users will be faces with many, many older examples and docs that use "C' and 'F', while the new ones refer to the new names -- might this be cause for even more confusion (at least for a few years...) leaving me with an equivocal +0 on that .... antoher thought: """ Definition: np.ravel(a, order='C') A 1-D array, containing the elements of the input, is returned. A copy is made only if needed. Parameters ---------- a : array_like Input array. The elements in ``a`` are read in the order specified by `order`, and packed as a 1-D array. order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used. """ Does ravel need to support the 'A' and 'K' options? It's kind of an advanced use, and really more suited to .view(), perhaps? What I'm getting at is that this version of ravel() conflates the two concepts: virtual ordering and memory ordering in one function -- maybe they should be considered as two different functions altogether -- I think that would make for less confusion. Éric Depagne wrote:
'row-first' and 'column-first' (or anything else that may be more explicit) ?
I like more explicit, but 'row-first' and 'column-first' have two issues: 1) what about higher dimension arrays?, and 2) the "row" and "column" convention is only that -- a convention -- I guess it's the way numpy prints, which gives it some meaning, but there are times when arrays are ordered: (col, row), rather than (row, col) (PIL uses that format for instance) I like the Z and N, and maybe even if they aren't used as flag names, they could be used in teh docstring -- nice and ascii safe.... Nathaniel wrote:
To see this, note that semantically it would be perfectly possible for .reshape() to take *two* order= arguments: one to specify the coordinate space mapping (2), and the other to specify the desired memory layout used by the result array (1). Of course we shouldn't actually do this, because in the unlikely event that someone actually wanted both of these they could just call asarray() on the output of reshape().
exactly -- my point about keeping the raveling with "virtual order" separate from reveling with memory order -- it's really not critical that you can do both with one function call. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi,
On Tue, Apr 2, 2013 at 12:29 PM, Chris Barker - NOAA Federal
On Mon, Apr 1, 2013 at 10:15 PM, Matthew Brett
wrote: Thank you for the compliment, it's more enjoyable than other potential explanations of my confusion (sigh).
But, I don't think that is the explanation.
well, the core explanation is these are difficult and intertwined concepts...And yes, better names and better docs can help.
Last, as soon as we came to the distinction between index order and memory layout, it was clear.
We all agreed that this was an important distinction that would improve numpy if we made it.
yup.
I think you agree that there is potential for confusion, and there doesn't seem any reason to continue with that confusion if we can come up with a clearer name.
well, changing an API is not to be taken lightly -- we are not discussion how we'd do it if we were to start from fresh here. So any change should make things enough better that it is worth dealing with the process of teh change.
Yes, for sure. I was only trying to point out that we are not talking about breaking backwards compatibility.
So here is a compromise proposal.
* Preferring the names 'c-style' and 'f-style' for the indexing order case (ravel, reshape, flatiter)
* Leaving 'C" and 'F' as functional shortcuts, so there is no possible backwards-compatibility problem.
seems reasonable enough -- though even with the backward compatibility, users will be faces with many, many older examples and docs that use "C' and 'F', while the new ones refer to the new names -- might this be cause for even more confusion (at least for a few years...)
I doubt it would be 'even more' confusion. They would only have to read the docstrings to work out what is meant, and I believe, with better names, they'd be less likely to fall into the traps I fell into, at least.
leaving me with an equivocal +0 on that ....
antoher thought:
""" Definition: np.ravel(a, order='C')
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
Parameters ---------- a : array_like Input array. The elements in ``a`` are read in the order specified by `order`, and packed as a 1-D array. order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used. """
Does ravel need to support the 'A' and 'K' options? It's kind of an advanced use, and really more suited to .view(), perhaps?
What I'm getting at is that this version of ravel() conflates the two concepts: virtual ordering and memory ordering in one function -- maybe they should be considered as two different functions altogether -- I think that would make for less confusion.
I think it would conceal the confusion only. If we don't have 'A' and 'K' in there, it allows us to keep the dream of a world where 'C" only refers to index ordering, but *only for this docstring*. As soon as somebody does ``np.array(arr, order='C')`` they will find themselves in conceptual trouble again. Cheers, Matthew
On Tue, Apr 2, 2013 at 2:04 PM, Matthew Brett
Hi,
On Tue, Apr 2, 2013 at 12:29 PM, Chris Barker - NOAA Federal
wrote: On Mon, Apr 1, 2013 at 10:15 PM, Matthew Brett
wrote: Thank you for the compliment, it's more enjoyable than other potential explanations of my confusion (sigh).
But, I don't think that is the explanation.
well, the core explanation is these are difficult and intertwined concepts...And yes, better names and better docs can help.
Last, as soon as we came to the distinction between index order and memory layout, it was clear.
We all agreed that this was an important distinction that would improve numpy if we made it.
yup.
I think you agree that there is potential for confusion, and there doesn't seem any reason to continue with that confusion if we can come up with a clearer name.
well, changing an API is not to be taken lightly -- we are not discussion how we'd do it if we were to start from fresh here. So any change should make things enough better that it is worth dealing with the process of teh change.
Yes, for sure. I was only trying to point out that we are not talking about breaking backwards compatibility.
So here is a compromise proposal.
* Preferring the names 'c-style' and 'f-style' for the indexing order case (ravel, reshape, flatiter)
* Leaving 'C" and 'F' as functional shortcuts, so there is no possible backwards-compatibility problem.
seems reasonable enough -- though even with the backward compatibility, users will be faces with many, many older examples and docs that use "C' and 'F', while the new ones refer to the new names -- might this be cause for even more confusion (at least for a few years...)
I doubt it would be 'even more' confusion. They would only have to read the docstrings to work out what is meant, and I believe, with better names, they'd be less likely to fall into the traps I fell into, at least.
leaving me with an equivocal +0 on that ....
antoher thought:
""" Definition: np.ravel(a, order='C')
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
Parameters ---------- a : array_like Input array. The elements in ``a`` are read in the order specified by `order`, and packed as a 1-D array. order : {'C','F', 'A', 'K'}, optional The elements of ``a`` are read in this order. 'C' means to view the elements in C (row-major) order. 'F' means to view the elements in Fortran (column-major) order. 'A' means to view the elements in 'F' order if a is Fortran contiguous, 'C' order otherwise. 'K' means to view the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, 'C' order is used. """
Does ravel need to support the 'A' and 'K' options? It's kind of an advanced use, and really more suited to .view(), perhaps?
What I'm getting at is that this version of ravel() conflates the two concepts: virtual ordering and memory ordering in one function -- maybe they should be considered as two different functions altogether -- I think that would make for less confusion.
I think it would conceal the confusion only. If we don't have 'A' and 'K' in there, it allows us to keep the dream of a world where 'C" only refers to index ordering, but *only for this docstring*. As soon as somebody does ``np.array(arr, order='C')`` they will find themselves in conceptual trouble again.
I still don't see why order is not a general concept, whether it refers to memory or indexing/iterating. The qualifier can be made clear in the docstrings (or from the context). It's all over the documentation: we can iterate in F-order over an array that is in C-order (*), or vice-versa (*) or just some strides http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.nditer.html#numpy.... pure shape http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html#c... shape and copy http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flatten.ht... memory http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html#c... http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html#from-... Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Apr 2, 2013 at 11:37 AM,
I still don't see why order is not a general concept, whether it refers to memory or indexing/iterating.
I agree -- the ordering concept is the same, it's _what_ is being ordered that's different. So I say we stick with 'C' and 'F' -- numpy users will need to figure out what it means eventually in any case.... we need some better doc strings and *maybe* renaming a keyword arguemnt or two. partly I say maybe because the "order" keyword in ravel() actually mixes the two concepts anyway... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Hi,
On Tue, Apr 2, 2013 at 4:07 PM, Chris Barker - NOAA Federal
On Tue, Apr 2, 2013 at 11:37 AM,
wrote: I still don't see why order is not a general concept, whether it refers to memory or indexing/iterating.
I agree -- the ordering concept is the same, it's _what_ is being ordered that's different. So I say we stick with 'C' and 'F' -- numpy users will need to figure out what it means eventually in any case....
I'm not quite sure what you are arguing. I thought we all agreed that the index ordering idea is *orthogonal* to the memory layout idea? Not so? Cheers, Matthew
Hi,
On Sat, Mar 30, 2013 at 1:57 PM,
On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
wrote: Hi,
On Sat, Mar 30, 2013 at 4:14 AM,
wrote: On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I always thought "F" and "C" are easy to understand, I always thought about the content and never about the memory when using it.
I can only say that 4 out of 4 experienced numpy developers found themselves unable to predict the behavior of these functions before they saw the output.
The problem is always that explaining something makes it clearer for a moment, but, for those who do not have the explanation or who have forgotten it, at least among us here, the outputs were generating groans and / or high fives as we incorrectly or correctly guessed what was going to happen.
I think the only way to find out whether this really is confusing or not, is to put someone in front of these functions without any explanation and ask them to predict what is going to come out of the various inputs and flags. Or to try and teach it, which was the problem we were having.
changing the names doesn't make it easier to understand. I think the confusion is because the new A and K refer to existing memory
``ravel`` is just stacking columns ('F') or stacking rows ('C'), I don't remember having seen any weird cases. ------------
I always thought of "order" in array creation is the way we want to have the memory layout of the *target* array and has nothing to do with existing memory layout (creating view or copy as needed).
In the case of ravel of course F and C in memory aren't relevant. 'F' and 'C' don't refer to target memory layout at all in 'reshape': In [26]: a = np.arange(10).reshape((2, 5)) In [28]: a.reshape((2, 5), order='F').flags Out[28]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False So I think that distinction actively confusing in this case, and more evidence that this is not the right name for what we mean. Cheers, Matthew
On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Personally I think it is clear enough and that "Z" and "N" would confuse me just as much (though I am used to the other names). Also "Z" and "N" would seem more like aliases, which would also make sense in the memory order context. If anything, I would prefer renaming the arguments iteration_order and memory_order, but it seems overdoing it... Maybe the documentation could just be checked if it is always clear though. I.e. maybe it does not use "iteration" or "memory" order consistently (though I somewhat feel it is usually clear that it must be iteration order, since no numpy function cares about the input memory order as they will just do a copy if necessary). Regards, Sebastian
Cheers,
Matthew Paul Ivanov JB Poline _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Personally I think it is clear enough and that "Z" and "N" would confuse me just as much (though I am used to the other names). Also "Z" and "N" would seem more like aliases, which would also make sense in the memory order context. If anything, I would prefer renaming the arguments iteration_order and memory_order, but it seems overdoing it...
I am not sure what you mean - at the moment there is one argument called 'order' that can refer to iteration order or memory order. Are you proposing two arguments?
Maybe the documentation could just be checked if it is always clear though. I.e. maybe it does not use "iteration" or "memory" order consistently (though I somewhat feel it is usually clear that it must be iteration order, since no numpy function cares about the input memory order as they will just do a copy if necessary).
Do you really mean this? Numpy is full of 'order=' flags that refer to memory. Cheers, Matthew
On Sat, 2013-03-30 at 12:45 -0700, Matthew Brett wrote:
Hi,
On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
wrote: On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
<snip>
What do y'all think?
Personally I think it is clear enough and that "Z" and "N" would confuse me just as much (though I am used to the other names). Also "Z" and "N" would seem more like aliases, which would also make sense in the memory order context. If anything, I would prefer renaming the arguments iteration_order and memory_order, but it seems overdoing it...
I am not sure what you mean - at the moment there is one argument called 'order' that can refer to iteration order or memory order. Are you proposing two arguments?
Yes that is what I meant. The reason that it is not convincing to me is that if I write `np.reshape(arr, ..., order='Z')`, I may be tempted to also write `np.copy(arr, order='Z')`. I don't see anything against allowing 'Z' as a more memorable 'C' (I also used to forget which was which), but I don't really see enforcing a different _value_ on the same named argument making it clearer. Renaming the argument itself would seem more sensible to me right now, but I cannot think of a decent name, so I would prefer trying to clarify the documentation if necessary.
Maybe the documentation could just be checked if it is always clear though. I.e. maybe it does not use "iteration" or "memory" order consistently (though I somewhat feel it is usually clear that it must be iteration order, since no numpy function cares about the input memory order as they will just do a copy if necessary).
Do you really mean this? Numpy is full of 'order=' flags that refer to memory.
I somewhat imagined there were more iteration order flags and I basically count empty/ones/.../copy as basically one "array creation" monster...
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Personally I think it is clear enough and that "Z" and "N" would confuse me just as much (though I am used to the other names). Also "Z" and "N" would seem more like aliases, which would also make sense in the memory order context. If anything, I would prefer renaming the arguments iteration_order and memory_order, but it seems overdoing it... Maybe the documentation could just be checked if it is always clear though. I.e. maybe it does not use "iteration" or "memory" order consistently (though I somewhat feel it is usually clear that it must be iteration order, since no numpy function cares about the input memory order as they will just do a copy if necessary).
I have been using both C and Fortran for 25 or so years. Despite that, I have to sit and think every time I need to know which way the arrays are stored, basically by remembering that in fortran you do (I,J,*) for an assumed-size array. So I *love* the idea of 'Z' and 'N' which I understood immediately. Andrew
On Sat, Mar 30, 2013 at 2:08 AM, Matthew Brett
Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Surely it should be "Z" and "ᴎ"? ;-) I knew what your examples would produce, but only because I've bumped into this before. When you do reshapes of various sorts (ravel() == reshape((-1,))), then, like you say, there are two totally different sets of coordinate mapping in play: chunk of memory <-1-> virtual array layout <-2-> new array layout (C pointers) <---> (Python indexes) <---> (Python indexes) Mapping (1) is determined by the array strides, and you have to think about it when you interface with C code, but at the Python level it's pretty much irrelevant; all operations are defined at the "virtual array layout" level. Further confusing the issue is the fact that the vast majority of legal memory<->virtual array mappings are *neither* C- nor F-ordered. Strides are very flexible. Further further confusing the issue is that mapping (2) actually consists of two mappings: if you have an array with shape (3, 4, 5) and reshape it to (4, 15), then the way you work out the overall mapping is by first mapping the (3, 4, 5) onto a flat 1-d space with 60 elements, and then mapping *that* to the (4, 15) space. Anyway, I agree that this is very confusing; certainly it confused me. If you bump into these two mappings just in passing, and separately, then it's very easy to miss the fact that they have nothing to do with each other. And I agree that using exactly the same terminology for both of them is part of what causes this. I even kind of like the "Z"/"N" naming scheme (I still have to look up what C/F actually mean every time, I'm ashamed to say). But I don't see how the proposed solution helps, because the problem isn't that mapping (1) and (2) use different ordering schemes -- the column-major/row-major distinction really does apply to both equally. Using different names for those seems like it will confuse the issue further, if anything. The problem IMHO is that sometimes "order=" is used to specify mapping (1), and sometimes it's used to specify mapping (2), when in fact these are totally orthogonal. To see this, note that semantically it would be perfectly possible for .reshape() to take *two* order= arguments: one to specify the coordinate space mapping (2), and the other to specify the desired memory layout used by the result array (1). Of course we shouldn't actually do this, because in the unlikely event that someone actually wanted both of these they could just call asarray() on the output of reshape(). Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.? This way if you just bumped into these while reading code, it would still be immediately obvious that they were dealing with totally different concepts. Compare to reading along without the docs and seeing a.reshape(..., order="Z") a.copy(order="C") That'd just leave me even more baffled than the current system -- I'd start thinking that "Z" and "C" somehow were different options for the same order= option, so they must somehow mean ways of ordering elements? -n
Hi,
On Tue, Apr 2, 2013 at 7:32 AM, Nathaniel Smith
On Sat, Mar 30, 2013 at 2:08 AM, Matthew Brett
wrote: Hi,
We were teaching today, and found ourselves getting very confused about ravel and shape in numpy.
Summary --------------
There are two separate ideas needed to understand ordering in ravel and reshape:
Idea 1): ravel / reshape can proceed from the last axis to the first, or the first to the last. This is "ravel index ordering" Idea 2) The physical layout of the array (on disk or in memory) can be "C" or "F" contiguous or neither. This is "memory ordering"
The index ordering is usually (but see below) orthogonal to the memory ordering.
The 'ravel' and 'reshape' commands use "C" and "F" in the sense of index ordering, and this mixes the two ideas and is confusing.
What the current situation looks like ----------------------------------------------------
Specifically, we've been rolling this around 4 experienced numpy users and we all predicted at least one of the results below wrongly.
This was what we knew, or should have known:
In [2]: import numpy as np
In [3]: arr = np.arange(10).reshape((2, 5))
In [5]: arr.ravel() Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
So, the 'ravel' operation unravels over the last axis (1) first, followed by axis 0.
So far so good (even if the opposite to MATLAB, Octave).
Then we found the 'order' flag to ravel:
In [10]: arr.flags Out[10]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [11]: arr.ravel('C') Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
But we soon got confused. How about this?
In [12]: arr_F = np.array(arr, order='F')
In [13]: arr_F.flags Out[13]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False
In [16]: arr_F Out[16]: array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])
In [17]: arr_F.ravel('C') Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Right - so the flag 'C' to ravel, has got nothing to do with *memory* ordering, but is to do with *index* ordering.
And in fact, we can ask for memory ordering specifically:
In [22]: arr.ravel('K') Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [23]: arr_F.ravel('K') Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
In [24]: arr.ravel('A') Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [25]: arr_F.ravel('A') Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
There are some confusions to get into with the 'order' flag to reshape as well, of the same type.
Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
This is very confusing. We think the index ordering and memory ordering ideas need to be separated, and specifically, we should avoid using "C" and "F" to refer to index ordering.
Proposal -------------
* Deprecate the use of "C" and "F" meaning backwards and forwards index ordering for ravel, reshape * Prefer "Z" and "N", being graphical representations of unraveling in 2 dimensions, axis1 first and axis0 first respectively (excellent naming idea by Paul Ivanov)
What do y'all think?
Surely it should be "Z" and "ᴎ"? ;-)
I knew what your examples would produce, but only because I've bumped into this before. When you do reshapes of various sorts (ravel() == reshape((-1,))), then, like you say, there are two totally different sets of coordinate mapping in play:
chunk of memory <-1-> virtual array layout <-2-> new array layout (C pointers) <---> (Python indexes) <---> (Python indexes)
Mapping (1) is determined by the array strides, and you have to think about it when you interface with C code, but at the Python level it's pretty much irrelevant; all operations are defined at the "virtual array layout" level.
Further confusing the issue is the fact that the vast majority of legal memory<->virtual array mappings are *neither* C- nor F-ordered. Strides are very flexible.
Further further confusing the issue is that mapping (2) actually consists of two mappings: if you have an array with shape (3, 4, 5) and reshape it to (4, 15), then the way you work out the overall mapping is by first mapping the (3, 4, 5) onto a flat 1-d space with 60 elements, and then mapping *that* to the (4, 15) space.
Anyway, I agree that this is very confusing; certainly it confused me. If you bump into these two mappings just in passing, and separately, then it's very easy to miss the fact that they have nothing to do with each other. And I agree that using exactly the same terminology for both of them is part of what causes this. I even kind of like the "Z"/"N" naming scheme (I still have to look up what C/F actually mean every time, I'm ashamed to say).
But I don't see how the proposed solution helps, because the problem isn't that mapping (1) and (2) use different ordering schemes -- the column-major/row-major distinction really does apply to both equally. Using different names for those seems like it will confuse the issue further, if anything. The problem IMHO is that sometimes "order=" is used to specify mapping (1), and sometimes it's used to specify mapping (2), when in fact these are totally orthogonal.
Yes. Of course ravel is the perfect storm because it refers to order in both senses.
To see this, note that semantically it would be perfectly possible for .reshape() to take *two* order= arguments: one to specify the coordinate space mapping (2), and the other to specify the desired memory layout used by the result array (1). Of course we shouldn't actually do this, because in the unlikely event that someone actually wanted both of these they could just call asarray() on the output of reshape().
Yes.
Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
That seems like a good idea. If you are proposing it, I am "+1".
This way if you just bumped into these while reading code, it would still be immediately obvious that they were dealing with totally different concepts. Compare to reading along without the docs and seeing a.reshape(..., order="Z") a.copy(order="C") That'd just leave me even more baffled than the current system -- I'd start thinking that "Z" and "C" somehow were different options for the same order= option, so they must somehow mean ways of ordering elements?
I don't think you'd be more baffled than the current system, which, as you say, conflates two orthogonal concepts. Rather, I think it would cause the user to stop, as they should, and consider what concept order is using in this case. I don't find it difficult to explain this: There are two different but related concepts of 'order' 1) The memory layout of the array 2) The index ordering used to unravel the array If you see 'Z' or 'N" for 'order' - that refers to index ordering. If you see 'C' or 'F" for order - that refers to memory layout. Cheers, Matthew
On Tue, Apr 2, 2013 at 6:59 PM, Matthew Brett
On Tue, Apr 2, 2013 at 7:32 AM, Nathaniel Smith
wrote: Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
That seems like a good idea. If you are proposing it, I am "+1".
Well, I'm just throwing it out there as an idea, but if people like it, nothing better turns up, and someone implements it, then I'm not going to say no...
This way if you just bumped into these while reading code, it would still be immediately obvious that they were dealing with totally different concepts. Compare to reading along without the docs and seeing a.reshape(..., order="Z") a.copy(order="C") That'd just leave me even more baffled than the current system -- I'd start thinking that "Z" and "C" somehow were different options for the same order= option, so they must somehow mean ways of ordering elements?
I don't think you'd be more baffled than the current system, which, as you say, conflates two orthogonal concepts. Rather, I think it would cause the user to stop, as they should, and consider what concept order is using in this case.
I don't find it difficult to explain this:
There are two different but related concepts of 'order'
1) The memory layout of the array 2) The index ordering used to unravel the array
If you see 'Z' or 'N" for 'order' - that refers to index ordering. If you see 'C' or 'F" for order - that refers to memory layout.
Sure, you can write it down like this, but compare to this system: If you see 'Z' or 'N" for 'order' - that refers to memory ordering. If you see 'C' or 'F" for order - that refers to index layout. Now suppose I forget which system we actually use -- how do you remember which system is which? It's totally arbitrary. Now I have even more things to remember. And I'm certainly not going to work out this distinction just from seeing these used once or twice in someone else's code. This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards". "C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument. -n
Hi,
On Tue, Apr 2, 2013 at 2:44 PM, Nathaniel Smith
On Tue, Apr 2, 2013 at 6:59 PM, Matthew Brett
wrote: On Tue, Apr 2, 2013 at 7:32 AM, Nathaniel Smith
wrote: Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
That seems like a good idea. If you are proposing it, I am "+1".
Well, I'm just throwing it out there as an idea, but if people like it, nothing better turns up, and someone implements it, then I'm not going to say no...
I would certainly be happy to implement it if there was some agreement it was the right way to go.
This way if you just bumped into these while reading code, it would still be immediately obvious that they were dealing with totally different concepts. Compare to reading along without the docs and seeing a.reshape(..., order="Z") a.copy(order="C") That'd just leave me even more baffled than the current system -- I'd start thinking that "Z" and "C" somehow were different options for the same order= option, so they must somehow mean ways of ordering elements?
I don't think you'd be more baffled than the current system, which, as you say, conflates two orthogonal concepts. Rather, I think it would cause the user to stop, as they should, and consider what concept order is using in this case.
I don't find it difficult to explain this:
There are two different but related concepts of 'order'
1) The memory layout of the array 2) The index ordering used to unravel the array
If you see 'Z' or 'N" for 'order' - that refers to index ordering. If you see 'C' or 'F" for order - that refers to memory layout.
Sure, you can write it down like this, but compare to this system:
If you see 'Z' or 'N" for 'order' - that refers to memory ordering. If you see 'C' or 'F" for order - that refers to index layout.
Now suppose I forget which system we actually use -- how do you remember which system is which? It's totally arbitrary.
I don't think it is completely arbitrary, as 'Z' / 'N' come from the process of getting elements from a 2D array in a certain order, and C / F memory layouts correspond to exactly what C and Fortran do (whereas the concept of index order cannot be separated from memory order for C, Fortran).
Now I have even more things to remember. And I'm certainly not going to work out this distinction just from seeing these used once or twice in someone else's code.
The extra things you have to remember are a) that there is a distinction (and this is good) and b) which of the two things you need to distinguish is 'Z' or 'C'. I think the benefit from a) is much greater than the small load from b).
This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results. Cheers, Matthew
On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases. I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names. -n
On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
And once we get into memory optimization (and avoiding copies and preserving contiguity), it is necessary to keep both orders in mind, is memory order in "F" and am I iterating/raveling in "F" order (or slicing columns). I think having two separate keywords give the impression we can choose two different things at the same time. Josef
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Apr 2, 2013 at 7:09 PM,
On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
wrote: On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
And once we get into memory optimization (and avoiding copies and preserving contiguity), it is necessary to keep both orders in mind, is memory order in "F" and am I iterating/raveling in "F" order (or slicing columns).
I think having two separate keywords give the impression we can choose two different things at the same time.
as aside (math): numpy.flatten made it into the Wikipedia page http://en.wikipedia.org/wiki/Vectorization_%28mathematics%29#Programming_lan... (and how it's different from R and Matlab/Octave, but doesn't mention: use order="F" to get the same behavior as math and the others) and the corresponding code in statsmodels (tools for vector autoregressive models by Wes) Josef baffled?
Josef
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Tue, Apr 2, 2013 at 7:09 PM,
On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
wrote: On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
And once we get into memory optimization (and avoiding copies and preserving contiguity), it is necessary to keep both orders in mind, is memory order in "F" and am I iterating/raveling in "F" order (or slicing columns).
I think having two separate keywords give the impression we can choose two different things at the same time.
I guess it could not make sense to do this: np.ravel(a, index_order='C', memory_order='F') It could make sense to do this: np.reshape(a, (3,4), index_order='F, memory_order='F') but that just points out the inherent confusion between the uses of 'order', and in this case, the fact that you can only do: np.reshape(a, (3, 4), index_order='F') correctly distinguishes between the meanings. Best, Matthew
On Tue, Apr 2, 2013 at 9:09 PM, Matthew Brett
Hi,
On Tue, Apr 2, 2013 at 7:09 PM,
wrote: On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
wrote: On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
And once we get into memory optimization (and avoiding copies and preserving contiguity), it is necessary to keep both orders in mind, is memory order in "F" and am I iterating/raveling in "F" order (or slicing columns).
I think having two separate keywords give the impression we can choose two different things at the same time.
I guess it could not make sense to do this:
np.ravel(a, index_order='C', memory_order='F')
It could make sense to do this:
np.reshape(a, (3,4), index_order='F, memory_order='F')
but that just points out the inherent confusion between the uses of 'order', and in this case, the fact that you can only do:
np.reshape(a, (3, 4), index_order='F')
correctly distinguishes between the meanings.
So, if index_order and memory_order are never in the same function, then the context should be enough. It was always enough for me. np.reshape(a, (3,4), index_order='F, memory_order='F') really hurts my head because you mix a function that operates on views, indexing and shapes with memory creation, (or I have no idea what memory_order should do in this case). np.asarray(a.reshape(3,4 order="F"), order="F") or the example here http://docs.scipy.org/doc/numpy/reference/generated/numpy.asfortranarray.htm... http://docs.scipy.org/doc/numpy/reference/generated/numpy.asarray.html keeps functions with index_order and functions with memory_order nicely separated. (It might be useful but very confusing to add memory_order to every function that creates a view if possible and a copy if necessary: "If you have to make a copy, then I want F memory order, otherwise give me a view" But I cannot find a candidate function right now, except for ravel and reshape see first notes in docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html ) ---- a day later (haven't changed my mind): isn't specifying "index order" in the Parameter section enough as an explanation? something like: ``` def ravel Parameters order : index order how the array is stacked into a 1d array. F means we stack by columns (fortran order, first index first), C means we stack by rows (c-order, last index first) ``` most array *creation* functions explicitly mention memory layout in the docstring Josef
Best,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Wed, Apr 3, 2013 at 5:19 AM,
On Tue, Apr 2, 2013 at 9:09 PM, Matthew Brett
wrote: Hi,
On Tue, Apr 2, 2013 at 7:09 PM,
wrote: On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
wrote: On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
And once we get into memory optimization (and avoiding copies and preserving contiguity), it is necessary to keep both orders in mind, is memory order in "F" and am I iterating/raveling in "F" order (or slicing columns).
I think having two separate keywords give the impression we can choose two different things at the same time.
I guess it could not make sense to do this:
np.ravel(a, index_order='C', memory_order='F')
It could make sense to do this:
np.reshape(a, (3,4), index_order='F, memory_order='F')
but that just points out the inherent confusion between the uses of 'order', and in this case, the fact that you can only do:
np.reshape(a, (3, 4), index_order='F')
correctly distinguishes between the meanings.
So, if index_order and memory_order are never in the same function, then the context should be enough. It was always enough for me.
It was not enough for me or the three others who will publicly admit to the shame of finding it confusing without further thought. Again, I just can't see a reason not to separate these ideas. We are not arguing about backwards compatibility here, only about clarity. I guess you do accept that some people, other than yourself, might be less likely to get tripped up by: np.reshape(a, (3, 4), index_order='F') than np.reshape(a, (3, 4), order='F') ?
np.reshape(a, (3,4), index_order='F, memory_order='F') really hurts my head because you mix a function that operates on views, indexing and shapes with memory creation, (or I have no idea what memory_order should do in this case).
Right. I think you may now be close to my own discomfort when faced with working out (fast) what: np.reshape(a, (3,4), order='F') means, given 'order' means two different things, and both might be relevant here. Or are you saying that my brain should have quickly calculated that that 'order' would be difficult to understand as memory layout and therefore rejected that and seen immediately that index order was the meaning? Speaking as a psychologist, I don't think that's the way it works. Cheers, Matthew
On Wed, Apr 3, 2013 at 11:39 AM, Matthew Brett
It was not enough for me or the three others who will publicly admit to the shame of finding it confusing without further thought.
I would submit that some of the confusion came from the fact that with ravel(), and the 'A' and 'K' flags, you are forced to figure out BOTH index_order and memory_order -- with one flag -- I know I'm still not clear what I'd get in complex situations.
Again, I just can't see a reason not to separate these ideas.
I agree, but really separating them -- but ideally having a given function only deal with one or the other, not both at once.
We are not arguing about backwards compatibility here, only about clarity.
while it could be changed while strictly maintaining backward compatibility -- it is a change that would need to filter through the docs, example, random blog posts, stack=overflow questions, etc...... Is that worth it? I'm not convinced
Right. I think you may now be close to my own discomfort when faced with working out (fast) what:
np.reshape(a, (3,4), order='F')
I still think it's cause you know too much.... ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Wed, Apr 3, 2013 at 11:52 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
On Wed, Apr 3, 2013 at 11:39 AM, Matthew Brett
wrote: It was not enough for me or the three others who will publicly admit to the shame of finding it confusing without further thought.
I would submit that some of the confusion came from the fact that with ravel(), and the 'A' and 'K' flags, you are forced to figure out BOTH index_order and memory_order -- with one flag -- I know I'm still not clear what I'd get in complex situations.
Again, I just can't see a reason not to separate these ideas.
I agree, but really separating them -- but ideally having a given function only deal with one or the other, not both at once.
We are not arguing about backwards compatibility here, only about clarity.
while it could be changed while strictly maintaining backward compatibility -- it is a change that would need to filter through the docs, example, random blog posts, stack=overflow questions, etc......
Not only that, we would then also be in the situation of having `order` *and* `xxx_order` keywords. This is also confusing, at least as much as the current situation imho. Ralf
Is that worth it? I'm not convinced
Right. I think you may now be close to my own discomfort when faced with working out (fast) what:
np.reshape(a, (3,4), order='F')
I still think it's cause you know too much.... ;-)
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Tue, Apr 2, 2013 at 5:52 PM, Nathaniel Smith
On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Thanks - but I guess we all agree that np.array(a, order='C') and np.ravel(a, order='F') are using the term 'order' in two different and orthogonal senses, and the discussion is about whether it is possible to get confused about these two senses and, if so, what we should do about it. Just to repeat what you're suggesting np.array(a, memory_order='C') np.ravel(a, index_order='C') np.ravel(a, index_order='K') That makes sense to me. I guess we'd have to do something like: def ravel(a, index_order='C', **kwargs): Where kwargs must be empty if the second arg is specified, otherwise it can contain only one key, 'order' and 'index_order'. Thus: np.ravel(a, index_order='C') will work for the forseeable future. Cheers, Matthew
On Tue, 2013-04-02 at 22:52 +0100, Nathaniel Smith wrote:
On Tue, Apr 2, 2013 at 10:21 PM, Matthew Brett
wrote: This is like observing that if I say "go North" then it's ambiguous about whether I want you to drive or walk, and concluding that we need new words for the directions depending on what sort of vehicle you use. So "go North" means drive North, "go htuoS" means walk North, etc. Totally silly. Makes much more sense to have one set of words for directions, and then make clear from context what the directions are used for -- "drive North", "walk North". Or "iterate C-wards", "store F-wards".
"C" and "Z" mean exactly the same thing -- they describe a way of unraveling a cube into a straight line. The difference is what we do with the resulting straight line. That's why I'm suggesting that the distinction should be made in the name of the argument.
Could you unpack that for the 'ravel' docstring? Because these options all refer to the way of unraveling and not the memory layout that results.
Z/C/column-major/whatever-you-want-to-call-it is a general strategy for converting between a 1-dim representation and a n-dim representation. In the case of memory storage, the 1-dim representation is the flat space of pointer arithmetic. In the case of ravel, the 1-dim representation is the flat space of a 1-dim indexed array. But the 1-dim-to-n-dim part is the same in both cases.
I think that's why you're seeing people baffled by your proposal -- to them the "C" refers to this general strategy, and what's different is the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too... So I am against different values for the order argument. I am somewhat fine with a new name, but it seems like that may get clumsy. But I would really love if someone would try to make the documentation simpler! There is also never a mention of "contiguity", even though when we refer to "memory order", then having a C/F contiguous array is often the reason why (in np.asarray "contiguous='C'" would make as much sense as "order", maybe even more so). Also 'A' seems often explained not quite correctly (though that does not matter (except for reshape, where its explanation is fuzzy), it will matter more in the future -- even if I don't expect 'A' to be actually used). If there is not yet, there should maybe be an overview in the user/reference guide of what order means and how application to new memory is different to reshape, etc. use it... Then the functions using order, can also reference that, plus maybe we have some place to look up what C and F is for all of us who like to forget it... - Sebastian
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too...
me too...
But I would really love if someone would try to make the documentation simpler!
yes, I think this is where the solution lies.
There is also never a mention of "contiguity", even though when we refer to "memory order", then having a C/F contiguous array is often the reason why
good point -- in fact, I have no idea what would happen in many of these cases for a discontiguous array (or one with arbitrarily weird strides...)
Also 'A' seems often explained not quite correctly (though that does not matter (except for reshape, where its explanation is fuzzy), it will matter more in the future -- even if I don't expect 'A' to be actually used).
I wonder about having a 'A' option in reshape at all -- what the heck does it mean? why do we need it? Again, I come back to the fact that memory order is kind-of orthogonal to index order. So for reshape (or ravel, which is really just a special case of reshape...) the 'A' flag and 'K' flag (huh?) is pretty dangerous, and prone to error. I think of it this way: Much of the beauty of numpy is that it presents a consistent interface to various forms of strided data -- that way, folks can write code that works the same way for any ndarray, while still being able to have internal storage be efficient for the use at hand -- i.e. C order for the common case, Fortran order for interaction with libraries that expect that order (or for algorithms that are more efficient in that order, though that's mostly external libs..), and non-contiguous data so one can work on sub-parts of arrays without copying data around. In most places, the numpy API hides the internal memory order -- this is a good thing, most people have no need to think about it (or most code, anyway), and you can write code that works (even if not optimally) for any (strided) memory layout. All is good. There are times when you really need to understand, or control or manipulate the memory layout, to make sure your routines are optimized, or the data is in the right form to pass of to an external lib, or to make sense of raw data read from a file, or... That's what we have .view() and friends for. However, the 'A' and 'K' flags mix and match these concepts -- and I think that's dangerous. it would be easy for the a to use the 'A' flag, and have everything work fine and dandy with all their test cases, only to have it blow up when someone passes in a different-than-expected array. So really, they should only be used in cases where the code has checked memory order before hand, or in a really well-defined interface where you know exactly what you're getting. In those cases, it makes the code far more clear an less error prone to do you re-arranging of the memory in a separate step, rather than built-in to a ravel() or reshape() call. [note] -- I wrote earlier that I wasn't confused by the ravel() examples -- true for teh 'c' and 'F' flags, but I'm still not at all clear what 'A' and 'K' woudl give me -- particularly for 'A' and reshape() So I think the cause of the confusion here is not that we use "order" in two different contexts, nor the fact that 'C' and 'F' may not mean anything to some people, but that we are conflating two different process in one function, and with one flag. My (maybe) proposal: we deprecate the 'A' and 'K' flags in ravel() and reshape(). (maybe even deprecate ravel() -- does it add anything to reshape? If not deprecate, at least encourage people in the docs not to use them, and rather do their memory-structure manipulations with .view or stride manipulation, or... I'm still trying to figure out when you'd want the 'A' flag -- it seems at the end of your operation you will want: The resulting array to be a particular shape, with the elements in a particular order and You _may_ want the in-memory layout a certain way. but 'A' can't ensure both of those. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Wed, 2013-04-03 at 08:52 -0700, Chris Barker - NOAA Federal wrote:
On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
wrote: the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too...
me too...
But I would really love if someone would try to make the documentation simpler!
yes, I think this is where the solution lies.
There is also never a mention of "contiguity", even though when we refer to "memory order", then having a C/F contiguous array is often the reason why
good point -- in fact, I have no idea what would happen in many of these cases for a discontiguous array (or one with arbitrarily weird strides...)
Also 'A' seems often explained not quite correctly (though that does not matter (except for reshape, where its explanation is fuzzy), it will matter more in the future -- even if I don't expect 'A' to be actually used).
I wonder about having a 'A' option in reshape at all -- what the heck does it mean? why do we need it? Again, I come back to the fact that memory order is kind-of orthogonal to index order. So for reshape (or ravel, which is really just a special case of reshape...) the 'A' flag and 'K' flag (huh?) is pretty dangerous, and prone to error. I think of it this way:
Actually 'K' + reshape is not even implemented sensibly and in current master I changed it to an error. I would not even know how to define it, and even if you find a definition I cannot imagine it being useful... Deprecating 'A' for reshape would seem OK to me since I doubt anyone actually uses it. It is currently equivalent to `'F' if input.flags.fnc else 'C'` (fnc means "fortran not c"), and as such is shaky business. I just realized that 'A' is a bit funny. Basically it means anything (Anyorder), including discontinuous memory chunks for np.array with copy=False. But if you do a copy (or reshape), lacking a more free way to do it, it means `'F' if input.flags.fnc else 'C'` again. Not sure about the history, but it seems to me 'K' basically supersedes 'A' for most stuff and its usage as Fortran or C, is more an accident because it is the simplest way to implement "I don't care". The use of 'K' is very sensible for copies of course. 'K' actually does make some sense for ravel, since if you don't care, it has the best chance of no copy. 'A' for ravel could/should in my opinion be deprecated just like for reshape, since it is pretty unpredictable.
Much of the beauty of numpy is that it presents a consistent interface to various forms of strided data -- that way, folks can write code that works the same way for any ndarray, while still being able to have internal storage be efficient for the use at hand -- i.e. C order for the common case, Fortran order for interaction with libraries that expect that order (or for algorithms that are more efficient in that order, though that's mostly external libs..), and non-contiguous data so one can work on sub-parts of arrays without copying data around.
In most places, the numpy API hides the internal memory order -- this is a good thing, most people have no need to think about it (or most code, anyway), and you can write code that works (even if not optimally) for any (strided) memory layout. All is good.
There are times when you really need to understand, or control or manipulate the memory layout, to make sure your routines are optimized, or the data is in the right form to pass of to an external lib, or to make sense of raw data read from a file, or... That's what we have .view() and friends for.
Yeah, I somewhat dislike the fact that "view" only works right for (roughly) C-contiguous arrays, thats another one of those old traps that is difficult to impossible to get rid of. Maybe some or all of view usages should be superseded by a new command... Regards, Sebastian
However, the 'A' and 'K' flags mix and match these concepts -- and I think that's dangerous. it would be easy for the a to use the 'A' flag, and have everything work fine and dandy with all their test cases, only to have it blow up when someone passes in a different-than-expected array. So really, they should only be used in cases where the code has checked memory order before hand, or in a really well-defined interface where you know exactly what you're getting. In those cases, it makes the code far more clear an less error prone to do you re-arranging of the memory in a separate step, rather than built-in to a ravel() or reshape() call.
[note] -- I wrote earlier that I wasn't confused by the ravel() examples -- true for teh 'c' and 'F' flags, but I'm still not at all clear what 'A' and 'K' woudl give me -- particularly for 'A' and reshape()
So I think the cause of the confusion here is not that we use "order" in two different contexts, nor the fact that 'C' and 'F' may not mean anything to some people, but that we are conflating two different process in one function, and with one flag.
My (maybe) proposal: we deprecate the 'A' and 'K' flags in ravel() and reshape(). (maybe even deprecate ravel() -- does it add anything to reshape? If not deprecate, at least encourage people in the docs not to use them, and rather do their memory-structure manipulations with .view or stride manipulation, or...
I'm still trying to figure out when you'd want the 'A' flag -- it seems at the end of your operation you will want:
The resulting array to be a particular shape, with the elements in a particular order
and
You _may_ want the in-memory layout a certain way.
but 'A' can't ensure both of those.
-Chris
Hi,
On Wed, Apr 3, 2013 at 8:52 AM, Chris Barker - NOAA Federal
On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
wrote: the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too...
me too...
But I would really love if someone would try to make the documentation simpler!
yes, I think this is where the solution lies.
No question that better docs would be an improvement, let's all agree on that. We all agree that 'order' is used with two different and orthogonal meanings in numpy. I think we are now more or less agreeing that: np.reshape(a, (3, 4), index_order='F') is at least as clear as: np.reshape(a, (3, 4), order='F') Do I have that right so far? Cheers, Matthew
Hi,
On Wed, Apr 3, 2013 at 11:44 AM, Matthew Brett
Hi,
On Wed, Apr 3, 2013 at 8:52 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
wrote: the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too...
me too...
But I would really love if someone would try to make the documentation simpler!
yes, I think this is where the solution lies.
No question that better docs would be an improvement, let's all agree on that.
We all agree that 'order' is used with two different and orthogonal meanings in numpy.
I think we are now more or less agreeing that:
np.reshape(a, (3, 4), index_order='F')
is at least as clear as:
np.reshape(a, (3, 4), order='F')
I believe uur job here is to come to some consensus. In that spirit, I think we do agree on these statements above. Now we have the cost / benefit. Benefit : Some people may find it easier to understand numpy when these constructs are separated. Cost : There might be some confusion because we have changed the default keywords. Benefit ----------- What proportion of people would find it easier to understand with the order constructs separated? Clearly Chris and Josef and Sebastian - you estimate I think no change in your understanding, because your understanding was near complete already. At least I, Paul Ivanov, JB Poline found the current state strikingly confusing. I think we have other votes for that position here. It's difficult to estimate the proportions now because my original email and the subsequent discussion are based on the distinction already being made. So, it is hard for us to be objective about whether a new user is likely to get confused. At least it seems reasonable to say that some moderate proportion of users will get confused. In that situation, it seems to me the long-term benefit for separating these ideas is relatively high. The benefit will continue over the long term. Cost ------- The ravel docstring would looks something like this: index_order : {'C','F', 'A', 'K'}, optional ... This keyword used to be called simply 'order', and you can also use the keyword 'order' to specify index_order (this parameter). The problem would then be that, for a while, there will be older code and docs using 'order' instead of 'index_order'. I think this would not cause much trouble. Reading the docstring will explain the change. The old code will continue to work. This cost will decrease to zero over time. So, if we are planning for the long-term for numpy, I believe the benefit to the change considerably outweighs the cost. I'm happy to do the code changes, so that's not an issue. Cheers, Matthew
On Wed, Apr 3, 2013 at 9:13 PM, Matthew Brett
Hi,
On Wed, Apr 3, 2013 at 11:44 AM, Matthew Brett
wrote: Hi,
On Wed, Apr 3, 2013 at 8:52 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
wrote: the context where it gets applied. So giving the same strategy two different names is silly; if anything it's the contexts that should have different names.
Yup, thats how I think about it too...
me too...
But I would really love if someone would try to make the documentation simpler!
yes, I think this is where the solution lies.
No question that better docs would be an improvement, let's all agree on that.
We all agree that 'order' is used with two different and orthogonal meanings in numpy.
I think we are now more or less agreeing that:
np.reshape(a, (3, 4), index_order='F')
is at least as clear as:
np.reshape(a, (3, 4), order='F')
I believe uur job here is to come to some consensus.
In that spirit, I think we do agree on these statements above.
Now we have the cost / benefit.
Benefit : Some people may find it easier to understand numpy when these constructs are separated.
Cost : There might be some confusion because we have changed the default keywords.
Benefit -----------
What proportion of people would find it easier to understand with the order constructs separated? Clearly Chris and Josef and Sebastian - you estimate I think no change in your understanding, because your understanding was near complete already.
At least I, Paul Ivanov, JB Poline found the current state strikingly confusing. I think we have other votes for that position here. It's difficult to estimate the proportions now because my original email and the subsequent discussion are based on the distinction already being made. So, it is hard for us to be objective about whether a new user is likely to get confused. At least it seems reasonable to say that some moderate proportion of users will get confused.
In that situation, it seems to me the long-term benefit for separating these ideas is relatively high. The benefit will continue over the long term.
Cost -------
The ravel docstring would looks something like this:
index_order : {'C','F', 'A', 'K'}, optional ... This keyword used to be called simply 'order', and you can also use the keyword 'order' to specify index_order (this parameter).
The problem would then be that, for a while, there will be older code and docs using 'order' instead of 'index_order'. I think this would not cause much trouble. Reading the docstring will explain the change. The old code will continue to work.
This cost will decrease to zero over time.
So, if we are planning for the long-term for numpy, I believe the benefit to the change considerably outweighs the cost.
I'm happy to do the code changes, so that's not an issue.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
We all agree that 'order' is used with two different and orthogonal meanings in numpy.
well, not entirely orthogonal -- they are the some concept, used in different contexts, so there is some benefit to their having similarity. So I"d advocate for using the same flag names in any case -- i.e. "C" and "F" in both cases.
I think we are now more or less agreeing that:
np.reshape(a, (3, 4), index_order='F')
is at least as clear as:
np.reshape(a, (3, 4), order='F')
sure. The trick is: np.reshape(a, (3, 4), index_order='A') which in mingling index_order and memory order......
I believe our job here is to come to some consensus.
yup.
In that spirit, I think we do agree on these statements above.
with the caveats I just added...
Now we have the cost / benefit.
Benefit : Some people may find it easier to understand numpy when these constructs are separated.
Cost : There might be some confusion because we have changed the default keywords.
Benefit -----------
What proportion of people would find it easier to understand with the order constructs separated?
It's not just numbers -- it's depth of confusion -- if, once you "get" it, you remember it for the rest of your numpy use, then it's not big deal. However, if you need to re-think and test every time you re-visit reshape or ravel, then there's a significant benefit. We are talking about "separating the concepts", but I think it takes more than a keyword change to do that -- the 'A' and 'K' flags mingle the concpets, and are going to be confusing with new keywords -- maybe even more so (it says index_order, but the docstring talks about memory order) Does anyone think we should depreciate the 'A' and 'K' flags? Before you answer that -- does anyone see a use case for the 'A' and 'K' flags that can't be reasonably easily accomplished with .view() or asarray() or ??? if we get rid of the 'A' and 'K' flags, I think think the docstring will be more clear, and there may be less need for two names for the different "order" concepts (though we could change the flags and the keywords...)
The ravel docstring would looks something like this:
index_order : {'C','F', 'A', 'K'}, optional ... This keyword used to be called simply 'order', and you can also use the keyword 'order' to specify index_order (this parameter).
The problem would then be that, for a while, there will be older code and docs using 'order' instead of 'index_order'. I think this would not cause much trouble. Reading the docstring will explain the change. The old code will continue to work.
not a killer, I agree. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Apr 4, 2013 at 12:21 PM, Chris Barker - NOAA Federal
On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
well, not entirely orthogonal -- they are the some concept, used in different contexts, so there is some benefit to their having similarity. So I"d advocate for using the same flag names in any case -- i.e. "C" and "F" in both cases.
I think we are now more or less agreeing that:
np.reshape(a, (3, 4), index_order='F')
is at least as clear as:
np.reshape(a, (3, 4), order='F')
sure.
The trick is:
np.reshape(a, (3, 4), index_order='A')
which in mingling index_order and memory order......
I believe our job here is to come to some consensus.
yup.
In that spirit, I think we do agree on these statements above.
with the caveats I just added...
Now we have the cost / benefit.
Benefit : Some people may find it easier to understand numpy when these constructs are separated.
Cost : There might be some confusion because we have changed the default keywords.
Benefit -----------
What proportion of people would find it easier to understand with the order constructs separated?
It's not just numbers -- it's depth of confusion -- if, once you "get" it, you remember it for the rest of your numpy use, then it's not big deal. However, if you need to re-think and test every time you re-visit reshape or ravel, then there's a significant benefit.
I would also add: If you need it, it's easy to find and understand, even if it's not completely "obvious" just reading the current docstring. ("Proof": I haven't seen anyone having problems with "column-stacking" in statsmodels.)
We are talking about "separating the concepts", but I think it takes more than a keyword change to do that -- the 'A' and 'K' flags mingle the concpets, and are going to be confusing with new keywords -- maybe even more so (it says index_order, but the docstring talks about memory order)
Does anyone think we should depreciate the 'A' and 'K' flags?
Before you answer that -- does anyone see a use case for the 'A' and 'K' flags that can't be reasonably easily accomplished with .view() or asarray() or ???
What order does a[a>2] use to create the returned 1-D array? I didn't know, don't remember if I ever knew, and I had to try it out. How do you find a docstring for this? http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html?highlight=ord... However, I never needed to know and never cared a[a>2] = 5 a[a>2] = b[a>2] Now, after this thread, I know about "K", and there might be cases where it would be appropriate to minimize copying memory, as Sebastian said, when (index) order doesn't matter. (Although I'm still using an older numpy, and won't have it for a while.)
if we get rid of the 'A' and 'K' flags, I think think the docstring will be more clear, and there may be less need for two names for the different "order" concepts (though we could change the flags and the keywords...)
The ravel docstring would looks something like this:
index_order : {'C','F', 'A', 'K'}, optional ... This keyword used to be called simply 'order', and you can also use the keyword 'order' to specify index_order (this parameter).
The problem would then be that, for a while, there will be older code and docs using 'order' instead of 'index_order'. I think this would not cause much trouble. Reading the docstring will explain the change. The old code will continue to work.
not a killer, I agree.
not a killer, but not worth the effort either, I still think. As I tried to explain, order is consistently used in the documentation both introduction and in many functions, as general concept with two levels of application. Either you have to rewrite it everywhere, or you get inconsistency. Newbie: "Why are they talking suddenly about index_order, did I miss something, which other orders are there?" I think adding a section to explain order more explicitly (Sebastian above) and improving the docstrings would be very helpful, but changing the name of the keyword is secondary. (and will mainly help as a reminder for users that are focused on memory, and not on the values in their arrays.) Josef ----------------------
aa.shape (5, 5) aa.var() 340.0 np.all(aa.ravel("A") == aa.ravel("C")) True np.all(aa.ravel("A") == aa.ravel("F")) True np.all(aa.ravel("C") == aa.ravel("F")) True
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Apr 4, 2013 at 11:26 AM,
Before you answer that -- does anyone see a use case for the 'A' and 'K' flags that can't be reasonably easily accomplished with .view() or asarray() or ???
What order does a[a>2] use to create the returned 1-D array? ... However, I never needed to know and never cared a[a>2] = 5 a[a>2] = b[a>2]
Now, after this thread, I know about "K",
does that use case use ravel() or reshape() under the hood?
and there might be cases where it would be appropriate to minimize copying memory,
hmm -- yes, that makes sense, and perhaps compelling enough to keep them around (at least with perhaps better docs). -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Thu, Apr 4, 2013 at 5:54 PM, Chris Barker - NOAA Federal
On Thu, Apr 4, 2013 at 11:26 AM,
wrote: Before you answer that -- does anyone see a use case for the 'A' and 'K' flags that can't be reasonably easily accomplished with .view() or asarray() or ???
What order does a[a>2] use to create the returned 1-D array? ... However, I never needed to know and never cared a[a>2] = 5 a[a>2] = b[a>2]
Now, after this thread, I know about "K",
does that use case use ravel() or reshape() under the hood?
only ravel has "K" as far as I saw in the current documentation. example for ravel("K") would be if axis=None in functions and we only have elementwise or reduce operations. All the code I've seen uses just ravel() in this case, instead, ravel("K") would have a better chance to avoid array copying, if axis is None: x = x.ravel("K") return ((x - x.mean(0))**2).sum(0) but it's dangerous because, if there is a second array, it might not ravel("K") the same way x.ravel("K") - y.ravel("K") sounds fun similar if x[mask] wouldn't select a fixed "order", then a[a>2] = b[a>2] would also be fun fun := find the bug that I have hidden in this code The only reason to use reshape with "A", I can think of, is, if the array (matrix) is symmetric, or if it's a square picture and we never care whether it's upright or sideways. reshape(.., order="A") and ravel("A") should roundtrip, I guess. Josef
and there might be cases where it would be appropriate to minimize copying memory,
hmm -- yes, that makes sense, and perhaps compelling enough to keep them around (at least with perhaps better docs).
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful: Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how. F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering. So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean: * read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
so there is some benefit to their having similarity.
Would you agree with the stuff above? If you do - do you agree that not separating these ideas could be confusing? Cheers, Matthew
Hi,
On Thu, Apr 4, 2013 at 11:45 AM, Matthew Brett
Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful:
Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how.
F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering.
So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean:
* read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
Sorry this is not well-put and should increase confusion rather than decrease it. I'll try again if I may. What do we mean by 'Fortran' 'order'. Two things : * np.array(a, order='F') - Fortran contiguous : the array memory is contiguous, the strides vector is strictly increasing * np.ravel(a, order='F') - first-to-last index ordering used to recover values from the array They are related in the sense that Fortran contiguous layout in memory means that returning the elements as stored in memory gives the same answer as first to last index ordering. They are different in the sense that first-to-last index ordering applies to any memory layout - is orthogonal to memory layout. In particular 'contiguous' has no meaning for first-to-last or last-to-first index ordering. So - to restate in other words - this : np.reshape(a, (3, 4), order='F') could reasonably mean one of two orthogonal things 1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout Cheers, Matthew
On Thu, Apr 4, 2013 at 3:40 PM, Matthew Brett
Hi,
On Thu, Apr 4, 2013 at 11:45 AM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful:
Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how.
F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering.
So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean:
* read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
Sorry this is not well-put and should increase confusion rather than decrease it. I'll try again if I may.
What do we mean by 'Fortran' 'order'.
Two things :
* np.array(a, order='F') - Fortran contiguous : the array memory is contiguous, the strides vector is strictly increasing * np.ravel(a, order='F') - first-to-last index ordering used to recover values from the array
They are related in the sense that Fortran contiguous layout in memory means that returning the elements as stored in memory gives the same answer as first to last index ordering. They are different in the sense that first-to-last index ordering applies to any memory layout - is orthogonal to memory layout. In particular 'contiguous' has no meaning for first-to-last or last-to-first index ordering.
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
no to interpretation 2) reshape and ravel (in contrast to flatten) just return a view (if possible) (with possible some strange strides) docstring: " numpy.reshape(a, newshape, order='C') Gives a new shape to an array without changing its data " functions that return views versus functions that create new arrays Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Thu, Apr 4, 2013 at 12:54 PM,
On Thu, Apr 4, 2013 at 3:40 PM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 11:45 AM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful:
Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how.
F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering.
So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean:
* read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
Sorry this is not well-put and should increase confusion rather than decrease it. I'll try again if I may.
What do we mean by 'Fortran' 'order'.
Two things :
* np.array(a, order='F') - Fortran contiguous : the array memory is contiguous, the strides vector is strictly increasing * np.ravel(a, order='F') - first-to-last index ordering used to recover values from the array
They are related in the sense that Fortran contiguous layout in memory means that returning the elements as stored in memory gives the same answer as first to last index ordering. They are different in the sense that first-to-last index ordering applies to any memory layout - is orthogonal to memory layout. In particular 'contiguous' has no meaning for first-to-last or last-to-first index ordering.
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
no to interpretation 2) reshape and ravel (in contrast to flatten) just return a view (if possible) (with possible some strange strides)
'No' meaning what? That it is not possible that it could mean that? Obviously we're not arguing about whether it does mean that, we're arguing about whether such an interpretation would make sense. Cheers, Matthew
On Thu, Apr 4, 2013 at 4:02 PM, Matthew Brett
Hi,
On Thu, Apr 4, 2013 at 12:54 PM,
wrote: On Thu, Apr 4, 2013 at 3:40 PM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 11:45 AM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: > We all agree that 'order' is used with two different and orthogonal > meanings in numpy.
Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful:
Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how.
F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering.
So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean:
* read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
Sorry this is not well-put and should increase confusion rather than decrease it. I'll try again if I may.
What do we mean by 'Fortran' 'order'.
Two things :
* np.array(a, order='F') - Fortran contiguous : the array memory is contiguous, the strides vector is strictly increasing * np.ravel(a, order='F') - first-to-last index ordering used to recover values from the array
They are related in the sense that Fortran contiguous layout in memory means that returning the elements as stored in memory gives the same answer as first to last index ordering. They are different in the sense that first-to-last index ordering applies to any memory layout - is orthogonal to memory layout. In particular 'contiguous' has no meaning for first-to-last or last-to-first index ordering.
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
no to interpretation 2) reshape and ravel (in contrast to flatten) just return a view (if possible) (with possible some strange strides)
'No' meaning what? That it is not possible that it could mean that? Obviously we're not arguing about whether it does mean that, we're arguing about whether such an interpretation would make sense.
'No' means: I don't think it makes sense given the current behavior of numpy with respect to functions that are designed to return views (and copy memory only if there is no way to make a view) One objective of functions that create views is *not* to change the underlying memory. So in most cases, requesting a specific contiguity (memory order) for a new array, when you actually want a view with strides, doesn't sound like an obvious explanation for "order". --- slightly more difficult: order = "I don't care" (aka. order="K") means: "I want a view in whichever order of the values, but please try harder not to copy any memory" This also doesn't refer to the memory of a *new* array, if it is really necessary to copy. Josef
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Thu, Apr 4, 2013 at 1:33 PM,
On Thu, Apr 4, 2013 at 4:02 PM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 12:54 PM,
wrote: On Thu, Apr 4, 2013 at 3:40 PM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 11:45 AM, Matthew Brett
wrote: Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
wrote: On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: >> We all agree that 'order' is used with two different and orthogonal >> meanings in numpy. Brief thank you for your helpful and thoughtful discussion.
well, not entirely orthogonal -- they are the some concept, used in different contexts,
Here's a further clarification, in the hope that it is helpful:
Input and output index orderings are orthogonal - I can read the data with C index ordering and return an array that is index ordered any-old-how.
F and C are used in the sense of F contiguous and C contiguous - where contiguous is not the same concept as index ordering.
So I think it's hard to say these concepts are not orthogonal, simply in the technical sense that order='F" could mean:
* read my data using F-style index ordering * return my data in an array using F-style index ordering * (related to above) return my data in F-contiguous memory layout
Sorry this is not well-put and should increase confusion rather than decrease it. I'll try again if I may.
What do we mean by 'Fortran' 'order'.
Two things :
* np.array(a, order='F') - Fortran contiguous : the array memory is contiguous, the strides vector is strictly increasing * np.ravel(a, order='F') - first-to-last index ordering used to recover values from the array
They are related in the sense that Fortran contiguous layout in memory means that returning the elements as stored in memory gives the same answer as first to last index ordering. They are different in the sense that first-to-last index ordering applies to any memory layout - is orthogonal to memory layout. In particular 'contiguous' has no meaning for first-to-last or last-to-first index ordering.
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
no to interpretation 2) reshape and ravel (in contrast to flatten) just return a view (if possible) (with possible some strange strides)
'No' meaning what? That it is not possible that it could mean that? Obviously we're not arguing about whether it does mean that, we're arguing about whether such an interpretation would make sense.
'No' means: I don't think it makes sense given the current behavior of numpy with respect to functions that are designed to return views (and copy memory only if there is no way to make a view)
OK - so no-one is suggesting that it is a good option, only that the concept makes sense. As I was saying before - for most of us it is still possible to get confused between two different meanings of the same word even if one of the meanings would (for complicated reasons) be less likely than the other. Cheers, Matthew
Catching up with numpy 1.6
'No' means: I don't think it makes sense given the current behavior of numpy with respect to functions that are designed to return views (and copy memory only if there is no way to make a view)
One objective of functions that create views is *not* to change the underlying memory. So in most cases, requesting a specific contiguity (memory order) for a new array, when you actually want a view with strides, doesn't sound like an obvious explanation for "order".
why I'm buffled: To me views are just a specific way of looking at an existing array, or parts of it, similar to an iteratior but with an n-dimensional shape. ravel is just like calling list(iterator), the iterator determines how we read the existing array. So, asking about the output memory order made no sense to me. What's the output of an iterator? I (and statsmodels) are still on numpy 1.5 but not for much longer. So I'm trying to read up http://docs.scipy.org/doc/numpy/reference/arrays.nditer.html#single-array-it... explains the case for "K" : for elementwise operations just run the fastest way through the array The old flat and flatiter where always c-order.
a = np.arange(4*5).reshape(4,5) b = np.array(a, order='F') np.fromiter(np.nditer(b, order='K'), int) array([ 0, 5, 10, 15, 1, 6, 11, 16, 2, 7, 12, 17, 3, 8, 13, 18, 4, 9, 14, 19]) np.fromiter(np.nditer(a, order='K'), int) array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
Is ravel('K') good for anything ?
def f(x): '''A function that only works in 1d''' if x.ndim > 1: raise ValueError return np.round(np.piecewise(x, [x < 0, x >= 0], [lambda x: np.sqrt(-x), lambda x: np.sqrt(x)]))
b = np.array(np.arange(4*5.).reshape(4,5), order='F') b array([[ 0., 1., 2., 3., 4.], [ 5., 6., 7., 8., 9.], [ 10., 11., 12., 13., 14.], [ 15., 16., 17., 18., 19.]])
f(b[:,:2]) Traceback (most recent call last): File "
", line 1, in <module> f(b[:,:2]) File " ", line 2, in f if x.ndim > 1: raise ValueError ValueError
ravel and reshape with 'K' doesn't roundtrip
(b.ravel('K')).reshape(b.shape, order='K') array([[ 0., 5., 10., 15., 1.], [ 6., 11., 16., 2., 7.], [ 12., 17., 3., 8., 13.], [ 18., 4., 9., 14., 19.]])
but we can do inplace transformations with it
e = b[:,:2].ravel() e.flags.owndata True e = b[:,:2].ravel('K') e.flags.owndata False
e[:] = f(e) b array([[ 0., 1., 2., 3., 4.], [ 2., 2., 7., 8., 9.], [ 3., 3., 12., 13., 14.], [ 4., 4., 17., 18., 19.]]) e[:] = f(e) b array([[ 0., 1., 2., 3., 4.], [ 1., 1., 7., 8., 9.], [ 2., 2., 12., 13., 14.], [ 2., 2., 17., 18., 19.]])
(A few hours of experimenting is more that I wanted to know, 99.5% of my cases are order='C' or order='F') nditer has also an interesting section on Iterator-Allocated Output Arrays Josef I found the scissors
On Thu, 2013-04-04 at 12:40 -0700, Matthew Brett wrote:
Hi,
<snip>
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
Yes, it could mean both. I am simply not sure if it helps enough to warrant the trouble. So if it still interests someone, I feel the docs are more important, but I am neutral to changing this. I don't quite see a big gain, so I am just worried that it bugs a lot of people either because of changing or because of having to remember the different name (you can argue that is good, but if it bugs most maybe it does not help either). As to being confused. Did anyone ever see a np.reshape(arr, ..., order='F') and then continuing assuming the result is F-contiguous (when the original arr is not known to be contiguous)? If that actually create a real bug somewhere, that might actually convince me that it is worth it to walk through trouble and complaints. I guess I just don't believe it really happens in the real world. - Sebastian
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Thu, Apr 4, 2013 at 1:53 PM, Sebastian Berg
On Thu, 2013-04-04 at 12:40 -0700, Matthew Brett wrote:
Hi,
<snip>
So - to restate in other words - this :
np.reshape(a, (3, 4), order='F')
could reasonably mean one of two orthogonal things
1) Retrieve data from the array using first-to-last indexing, return any memory layout you like 2) Retrieve data from the array using the default last-to-first index ordering, and return memory in F-contiguous layout
Yes, it could mean both. I am simply not sure if it helps enough to warrant the trouble. So if it still interests someone, I feel the docs are more important, but I am neutral to changing this.
I don't think the docs enter the discussion, because we all agree that changing the docs is a good idea.
I don't quite see a big gain, so I am just worried that it bugs a lot of people either because of changing or because of having to remember the different name (you can argue that is good, but if it bugs most maybe it does not help either).
As to being confused. Did anyone ever see a np.reshape(arr, ..., order='F') and then continuing assuming the result is F-contiguous (when the original arr is not known to be contiguous)? If that actually create a real bug somewhere, that might actually convince me that it is worth it to walk through trouble and complaints. I guess I just don't believe it really happens in the real world.
There are two aspects here; 1) Making numpy easier to understand and teach. 2) Avoiding bugs I'm thinking primarily of the first. I would hate to teach the thing in the current state. As I've said many times before, I found it very confusing, others have said so too. The more confusing it is, the more likely people will make mistakes. Cheers, Matthew
Hi,
On Thu, Apr 4, 2013 at 9:21 AM, Chris Barker - NOAA Federal
On Wed, Apr 3, 2013 at 6:13 PM, Matthew Brett
wrote: We all agree that 'order' is used with two different and orthogonal meanings in numpy.
well, not entirely orthogonal -- they are the some concept, used in different contexts, so there is some benefit to their having similarity. So I"d advocate for using the same flag names in any case -- i.e. "C" and "F" in both cases.
I think we are now more or less agreeing that:
np.reshape(a, (3, 4), index_order='F')
is at least as clear as:
np.reshape(a, (3, 4), order='F')
sure.
The trick is:
np.reshape(a, (3, 4), index_order='A')
which in mingling index_order and memory order......
I believe our job here is to come to some consensus.
yup.
In that spirit, I think we do agree on these statements above.
with the caveats I just added...
Now we have the cost / benefit.
Benefit : Some people may find it easier to understand numpy when these constructs are separated.
Cost : There might be some confusion because we have changed the default keywords.
Benefit -----------
What proportion of people would find it easier to understand with the order constructs separated?
It's not just numbers -- it's depth of confusion -- if, once you "get" it, you remember it for the rest of your numpy use, then it's not big deal. However, if you need to re-think and test every time you re-visit reshape or ravel, then there's a significant benefit.
We are talking about "separating the concepts", but I think it takes more than a keyword change to do that -- the 'A' and 'K' flags mingle the concpets, and are going to be confusing with new keywords -- maybe even more so (it says index_order, but the docstring talks about memory order)
Does anyone think we should depreciate the 'A' and 'K' flags?
Would you consider moving this one to another thread? Cheers, Matthew
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead: a.reshape(..., order="C") a.copy(layout="F") This fits well with the terms we've been using during the discussion. It reduces the changes to only one of the two meanings. Thinking about it, I feel that this would have been considerably clearer to me as I learned numpy. Cheers, Matthew
Hey On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
wrote: <snip> Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead:
a.reshape(..., order="C") a.copy(layout="F")
I actually like this, makes the point clearer that it has to do with memory layout and implies contiguity, plus it is short and from the numpy perspective copy, etc. are the ones that add additional info to "order" and not reshape (because IMO memory order is something new users should not worry about at first). A and K orders will still have their quirks with np.array and copy=True/False, but for many functions they are esoteric anyway. It will be one hell of a deprecation though, but I am +0.5 for adding an alias for now (maybe someone knows an even better name?), but I think that in this case, it probably really is better to wait with actual deprecation warnings for a few versions, since it touches a *lot* of code. Plus I think at the point of starting deprecation warnings (and best earlier) numpy should provide an automatic fixer script... The only counter point that remains for me is the difficulty of deprecation, since I think the new name idea is very clean. And this is unfortunately even more invasive then the index_order proposal. Fun point at the end: ndarray.tostring takes an order argument, which is correct as "order" but has a lot in common with "layout" :). (that is not an issue IMO, but for me it is a reason to prefer the layout proposal over the index_order one). Regards, Sebastian
This fits well with the terms we've been using during the discussion. It reduces the changes to only one of the two meanings.
Thinking about it, I feel that this would have been considerably clearer to me as I learned numpy.
Cheers,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
On Fri, Apr 5, 2013 at 2:20 AM, Sebastian Berg
Hey
On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
wrote: <snip> Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead:
a.reshape(..., order="C") a.copy(layout="F")
I actually like this, makes the point clearer that it has to do with memory layout and implies contiguity, plus it is short and from the numpy perspective copy, etc. are the ones that add additional info to "order" and not reshape (because IMO memory order is something new users should not worry about at first). A and K orders will still have their quirks with np.array and copy=True/False, but for many functions they are esoteric anyway.
It will be one hell of a deprecation though, but I am +0.5 for adding an alias for now (maybe someone knows an even better name?), but I think that in this case, it probably really is better to wait with actual deprecation warnings for a few versions, since it touches a *lot* of code. Plus I think at the point of starting deprecation warnings (and best earlier) numpy should provide an automatic fixer script...
The only counter point that remains for me is the difficulty of deprecation, since I think the new name idea is very clean. And this is unfortunately even more invasive then the index_order proposal.
I completely agree that we'd have to be gentle with the change. The problem we'd want to avoid is people innocently using 'layout' and finding to their annoyance that the code doesn't work with other people's numpy. How about: Step 1: 'order' remains as named keyword, layout added as alias, comment on the lines of "layout will become the default keyword for this option in later versions of numpy; please consider updating any code that does not need to remain backwards compatible'. Step 2: default keyword becomes 'layout' with 'order' as alias, comment like "order is an alias for 'layout' to maintain backwards compatibility with numpy <= 1.7.1', please update any code that does not need to maintain backwards compatibility with these numpy versions' Step 3: Add deprecation warning for 'order', "order will be removed as an alias in future versions of numpy" Step 4: (distant future) Remove alias ? Cheers, Matthew
On Fri, Apr 5, 2013 at 5:13 PM, Matthew Brett
Hi,
On Fri, Apr 5, 2013 at 2:20 AM, Sebastian Berg
wrote: Hey
On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
wrote: <snip> Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead:
a.reshape(..., order="C") a.copy(layout="F")
I actually like this, makes the point clearer that it has to do with memory layout and implies contiguity, plus it is short and from the numpy perspective copy, etc. are the ones that add additional info to "order" and not reshape (because IMO memory order is something new users should not worry about at first). A and K orders will still have their quirks with np.array and copy=True/False, but for many functions they are esoteric anyway.
It will be one hell of a deprecation though, but I am +0.5 for adding an alias for now (maybe someone knows an even better name?), but I think that in this case, it probably really is better to wait with actual deprecation warnings for a few versions, since it touches a *lot* of code. Plus I think at the point of starting deprecation warnings (and best earlier) numpy should provide an automatic fixer script...
The only counter point that remains for me is the difficulty of deprecation, since I think the new name idea is very clean. And this is unfortunately even more invasive then the index_order proposal.
I completely agree that we'd have to be gentle with the change. The problem we'd want to avoid is people innocently using 'layout' and finding to their annoyance that the code doesn't work with other people's numpy.
How about:
Step 1: 'order' remains as named keyword, layout added as alias, comment on the lines of "layout will become the default keyword for this option in later versions of numpy; please consider updating any code that does not need to remain backwards compatible'.
Step 2: default keyword becomes 'layout' with 'order' as alias, comment like "order is an alias for 'layout' to maintain backwards compatibility with numpy <= 1.7.1', please update any code that does not need to maintain backwards compatibility with these numpy versions'
Step 3: Add deprecation warning for 'order', "order will be removed as an alias in future versions of numpy"
Step 4: (distant future) Remove alias
?
A very strong -1 from me. Now we're talking about deprecation warnings and a backwards compatibility break after all. I thought we agreed that this was a very bad idea, so why are you proposing it now? Here's how I see it: deprecation of "order" is a no go. Therefore we have two choices here: 1. Simply document the current "order" keyword better and leave it at that. 2. Add a "layout" (or "index_order") keyword, and live with both "order" and "layout" keywords forever. (2) is at least as confusing as (1), more work and poor design. Therefore I propose to go with (1). Ralf
Hi,
On Fri, Apr 5, 2013 at 3:09 PM, Ralf Gommers
On Fri, Apr 5, 2013 at 5:13 PM, Matthew Brett
wrote: Hi,
On Fri, Apr 5, 2013 at 2:20 AM, Sebastian Berg
wrote: Hey
On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
wrote: <snip> Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead:
a.reshape(..., order="C") a.copy(layout="F")
I actually like this, makes the point clearer that it has to do with memory layout and implies contiguity, plus it is short and from the numpy perspective copy, etc. are the ones that add additional info to "order" and not reshape (because IMO memory order is something new users should not worry about at first). A and K orders will still have their quirks with np.array and copy=True/False, but for many functions they are esoteric anyway.
It will be one hell of a deprecation though, but I am +0.5 for adding an alias for now (maybe someone knows an even better name?), but I think that in this case, it probably really is better to wait with actual deprecation warnings for a few versions, since it touches a *lot* of code. Plus I think at the point of starting deprecation warnings (and best earlier) numpy should provide an automatic fixer script...
The only counter point that remains for me is the difficulty of deprecation, since I think the new name idea is very clean. And this is unfortunately even more invasive then the index_order proposal.
I completely agree that we'd have to be gentle with the change. The problem we'd want to avoid is people innocently using 'layout' and finding to their annoyance that the code doesn't work with other people's numpy.
How about:
Step 1: 'order' remains as named keyword, layout added as alias, comment on the lines of "layout will become the default keyword for this option in later versions of numpy; please consider updating any code that does not need to remain backwards compatible'.
Step 2: default keyword becomes 'layout' with 'order' as alias, comment like "order is an alias for 'layout' to maintain backwards compatibility with numpy <= 1.7.1', please update any code that does not need to maintain backwards compatibility with these numpy versions'
Step 3: Add deprecation warning for 'order', "order will be removed as an alias in future versions of numpy"
Step 4: (distant future) Remove alias
?
A very strong -1 from me. Now we're talking about deprecation warnings and a backwards compatibility break after all. I thought we agreed that this was a very bad idea, so why are you proposing it now?
Here's how I see it: deprecation of "order" is a no go. Therefore we have two choices here: 1. Simply document the current "order" keyword better and leave it at that. 2. Add a "layout" (or "index_order") keyword, and live with both "order" and "layout" keywords forever.
(2) is at least as confusing as (1), more work and poor design. Therefore I propose to go with (1).
You are saying that deprecation of 'order' at any stage in the next 10 years of numpy's lifetime is a no go? I think that is short-sighted and I think it will damage numpy. Believe me, I have as much investment in backward compatibility as you do. All the three libraries that I spend a long time maintaining need to test against old numpy versions - but - for heaven's sake - only back to numpy 1.2 or numpy 1.3. We don't support Python 2.5 any more, and I don't think we need to maintain compatibility with Numeric either. If you are saying that we need to maintain compatibility for 10 years at a stretch, then we will have to accept that numpy will gradually decay into a legacy library, because it is certain that, if we stay static, someone else with more ambition will do a better job. There is a cost to being averse to any change at all, no matter how gradually it is managed. Best, Matthew
On Fri, Apr 5, 2013 at 9:21 PM, Matthew Brett
Hi,
On Fri, Apr 5, 2013 at 3:09 PM, Ralf Gommers
wrote: On Fri, Apr 5, 2013 at 5:13 PM, Matthew Brett
wrote: Hi,
On Fri, Apr 5, 2013 at 2:20 AM, Sebastian Berg
wrote: Hey
On Thu, 2013-04-04 at 14:20 -0700, Matthew Brett wrote:
Hi,
On Tue, Apr 2, 2013 at 4:32 AM, Nathaniel Smith
<snip>
Maybe we should go through and rename "order" to something more descriptive in each case, so we'd have a.reshape(..., index_order="C") a.copy(memory_order="F") etc.?
I'd like to propose this instead:
a.reshape(..., order="C") a.copy(layout="F")
I actually like this, makes the point clearer that it has to do with memory layout and implies contiguity, plus it is short and from the numpy perspective copy, etc. are the ones that add additional info to "order" and not reshape (because IMO memory order is something new users should not worry about at first). A and K orders will still have their quirks with np.array and copy=True/False, but for many functions they are esoteric anyway.
It will be one hell of a deprecation though, but I am +0.5 for adding an alias for now (maybe someone knows an even better name?), but I think that in this case, it probably really is better to wait with actual deprecation warnings for a few versions, since it touches a *lot* of code. Plus I think at the point of starting deprecation warnings (and best earlier) numpy should provide an automatic fixer script...
The only counter point that remains for me is the difficulty of deprecation, since I think the new name idea is very clean. And this is unfortunately even more invasive then the index_order proposal.
I completely agree that we'd have to be gentle with the change. The problem we'd want to avoid is people innocently using 'layout' and finding to their annoyance that the code doesn't work with other people's numpy.
How about:
Step 1: 'order' remains as named keyword, layout added as alias, comment on the lines of "layout will become the default keyword for this option in later versions of numpy; please consider updating any code that does not need to remain backwards compatible'.
Step 2: default keyword becomes 'layout' with 'order' as alias, comment like "order is an alias for 'layout' to maintain backwards compatibility with numpy <= 1.7.1', please update any code that does not need to maintain backwards compatibility with these numpy versions'
Step 3: Add deprecation warning for 'order', "order will be removed as an alias in future versions of numpy"
Step 4: (distant future) Remove alias
?
A very strong -1 from me. Now we're talking about deprecation warnings and a backwards compatibility break after all. I thought we agreed that this was a very bad idea, so why are you proposing it now?
Here's how I see it: deprecation of "order" is a no go. Therefore we have two choices here: 1. Simply document the current "order" keyword better and leave it at
wrote: that.
2. Add a "layout" (or "index_order") keyword, and live with both "order" and "layout" keywords forever.
(2) is at least as confusing as (1), more work and poor design. Therefore I propose to go with (1).
You are saying that deprecation of 'order' at any stage in the next 10 years of numpy's lifetime is a no go?
For something like this? Yes.
I think that is short-sighted and I think it will damage numpy.
It will damage numpy to be conservative and not change a name for a little bit of clarity for some people that avoids reading the docs maybe a little more carefully? There's a lot of things that can damage numpy, but this isn't even close in my book. Too few developers, continuous backwards compatibility issues, faster alternative libraries surpassing numpy - that's the kind of thing that causes damage.
Believe me, I have as much investment in backward compatibility as you do. All the three libraries that I spend a long time maintaining need to test against old numpy versions - but - for heaven's sake - only back to numpy 1.2 or numpy 1.3. We don't support Python 2.5 any more, and I don't think we need to maintain compatibility with Numeric either.
Really? This is from 3 months ago: http://article.gmane.org/gmane.comp.python.numeric.general/52632. It's now 2013, we are probably dropping numarray compat in 1.8. Not exactly 10 years, but of the same order.
If you are saying that we need to maintain compatibility for 10 years at a stretch, then we will have to accept that numpy will gradually decay into a legacy library, because it is certain that, if we stay static, someone else with more ambition will do a better job.
There is a cost to being averse to any change at all, no matter how gradually it is managed.
It's a cost/benefit trade-off, yes. Breaking backwards compatibility for a big step forward is sometimes necessary, in order to avoid decay as you say. You seem to have lost sight of the little thing you're arguing for though. There simply is no big step forward here. Ralf
Hi, On Friday, April 5, 2013 at 12:09 PM, Ralf Gommers wrote:
On Fri, Apr 5, 2013 at 5:13 PM, Matthew Brett
wrote: How about:
Step 1: 'order' remains as named keyword, layout added as alias, comment on the lines of "layout will become the default keyword for this option in later versions of numpy; please consider updating any code that does not need to remain backwards compatible'.
Step 2: default keyword becomes 'layout' with 'order' as alias, comment like "order is an alias for 'layout' to maintain backwards compatibility with numpy <= 1.7.1', please update any code that does not need to maintain backwards compatibility with these numpy versions'
Step 3: Add deprecation warning for 'order', "order will be removed as an alias in future versions of numpy"
Step 4: (distant future) Remove alias
?
A very strong -1 from me. Now we're talking about deprecation warnings and a backwards compatibility break after all. I thought we agreed that this was a very bad idea, so why are you proposing it now?
Here's how I see it: deprecation of "order" is a no go. Therefore we have two choices here: 1. Simply document the current "order" keyword better and leave it at that. 2. Add a "layout" (or "index_order") keyword, and live with both "order" and "layout" keywords forever.
(2) is at least as confusing as (1), more work and poor design. Therefore I propose to go with (1). I agree with Ralf. It's not worth breaking backwards compatibility or supporting two flags (with only further potential for confusion). If we were designing a system from scratch, I concede that it _might_ have been better to use 'layout' instead of 'order'…. but that decision has already been made.
This proposal fails the cost/benefit analysis, being too expensive for too little benefit. Regards, Brad
participants (9)
-
Andrew Jaffe
-
Bradley M. Froehle
-
Chris Barker - NOAA Federal
-
josef.pktd@gmail.com
-
Matthew Brett
-
Nathaniel Smith
-
Ralf Gommers
-
Sebastian Berg
-
Éric Depagne