[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett at gmail.com
Sat Mar 30 15:45:52 EDT 2013


Hi,

On Sat, Mar 30, 2013 at 11:55 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
> On Fri, 2013-03-29 at 19:08 -0700, Matthew Brett wrote:
>> Hi,
>>
>> We were teaching today, and found ourselves getting very confused
>> about ravel and shape in numpy.
>>
>> Summary
>> --------------
>>
>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>
>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>> or the first to the last.  This is "ravel index ordering"
>> Idea 2) The physical layout of the array (on disk or in memory) can be
>> "C" or "F" contiguous or neither.
>> This is "memory ordering"
>>
>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>
>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>> index ordering, and this mixes the two ideas and is confusing.
>>
>> What the current situation looks like
>> ----------------------------------------------------
>>
>> Specifically, we've been rolling this around 4 experienced numpy users
>> and we all predicted at least one of the results below wrongly.
>>
>> This was what we knew, or should have known:
>>
>> In [2]: import numpy as np
>>
>> In [3]: arr = np.arange(10).reshape((2, 5))
>>
>> In [5]: arr.ravel()
>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> So, the 'ravel' operation unravels over the last axis (1) first,
>> followed by axis 0.
>>
>> So far so good (even if the opposite to MATLAB, Octave).
>>
>> Then we found the 'order' flag to ravel:
>>
>> In [10]: arr.flags
>> Out[10]:
>>   C_CONTIGUOUS : True
>>   F_CONTIGUOUS : False
>>   OWNDATA : False
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [11]: arr.ravel('C')
>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> But we soon got confused.  How about this?
>>
>> In [12]: arr_F = np.array(arr, order='F')
>>
>> In [13]: arr_F.flags
>> Out[13]:
>>   C_CONTIGUOUS : False
>>   F_CONTIGUOUS : True
>>   OWNDATA : True
>>   WRITEABLE : True
>>   ALIGNED : True
>>   UPDATEIFCOPY : False
>>
>> In [16]: arr_F
>> Out[16]:
>> array([[0, 1, 2, 3, 4],
>>        [5, 6, 7, 8, 9]])
>>
>> In [17]: arr_F.ravel('C')
>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>> ordering, but is to do with *index* ordering.
>>
>> And in fact, we can ask for memory ordering specifically:
>>
>> In [22]: arr.ravel('K')
>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [23]: arr_F.ravel('K')
>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> In [24]: arr.ravel('A')
>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>
>> In [25]: arr_F.ravel('A')
>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>
>> There are some confusions to get into with the 'order' flag to reshape
>> as well, of the same type.
>>
>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>
>> This is very confusing.  We think the index ordering and memory
>> ordering ideas need to be separated, and specifically, we should avoid
>> using "C" and "F" to refer to index ordering.
>>
>> Proposal
>> -------------
>>
>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>> index ordering for ravel, reshape
>> * Prefer "Z" and "N", being graphical representations of unraveling in
>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>> naming idea by Paul Ivanov)
>>
>> What do y'all think?
>>
>
> Personally I think it is clear enough and that "Z" and "N" would confuse
> me just as much (though I am used to the other names). Also "Z" and "N"
> would seem more like aliases, which would also make sense in the memory
> order context.
> If anything, I would prefer renaming the arguments iteration_order and
> memory_order, but it seems overdoing it...

I am not sure what you mean - at the moment  there is one argument
called 'order' that can refer to iteration order or memory order.  Are
you proposing two arguments?

> Maybe the documentation could just be checked if it is always clear
> though. I.e. maybe it does not use "iteration" or "memory" order
> consistently (though I somewhat feel it is usually clear that it must be
> iteration order, since no numpy function cares about the input memory
> order as they will just do a copy if necessary).

Do you really mean this?  Numpy is full of 'order=' flags that refer to memory.

Cheers,

Matthew



More information about the NumPy-Discussion mailing list