[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Matthew Brett matthew.brett at gmail.com
Sat Mar 30 18:19:33 EDT 2013


Hi,

On Sat, Mar 30, 2013 at 1:57 PM,  <josef.pktd at gmail.com> wrote:
> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>> Hi,
>>
>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We were teaching today, and found ourselves getting very confused
>>>> about ravel and shape in numpy.
>>>>
>>>> Summary
>>>> --------------
>>>>
>>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>>
>>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>>> or the first to the last.  This is "ravel index ordering"
>>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>>> "C" or "F" contiguous or neither.
>>>> This is "memory ordering"
>>>>
>>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>>
>>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>>> index ordering, and this mixes the two ideas and is confusing.
>>>>
>>>> What the current situation looks like
>>>> ----------------------------------------------------
>>>>
>>>> Specifically, we've been rolling this around 4 experienced numpy users
>>>> and we all predicted at least one of the results below wrongly.
>>>>
>>>> This was what we knew, or should have known:
>>>>
>>>> In [2]: import numpy as np
>>>>
>>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>>
>>>> In [5]: arr.ravel()
>>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>>> followed by axis 0.
>>>>
>>>> So far so good (even if the opposite to MATLAB, Octave).
>>>>
>>>> Then we found the 'order' flag to ravel:
>>>>
>>>> In [10]: arr.flags
>>>> Out[10]:
>>>>   C_CONTIGUOUS : True
>>>>   F_CONTIGUOUS : False
>>>>   OWNDATA : False
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [11]: arr.ravel('C')
>>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> But we soon got confused.  How about this?
>>>>
>>>> In [12]: arr_F = np.array(arr, order='F')
>>>>
>>>> In [13]: arr_F.flags
>>>> Out[13]:
>>>>   C_CONTIGUOUS : False
>>>>   F_CONTIGUOUS : True
>>>>   OWNDATA : True
>>>>   WRITEABLE : True
>>>>   ALIGNED : True
>>>>   UPDATEIFCOPY : False
>>>>
>>>> In [16]: arr_F
>>>> Out[16]:
>>>> array([[0, 1, 2, 3, 4],
>>>>        [5, 6, 7, 8, 9]])
>>>>
>>>> In [17]: arr_F.ravel('C')
>>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>>> ordering, but is to do with *index* ordering.
>>>>
>>>> And in fact, we can ask for memory ordering specifically:
>>>>
>>>> In [22]: arr.ravel('K')
>>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [23]: arr_F.ravel('K')
>>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> In [24]: arr.ravel('A')
>>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>>
>>>> In [25]: arr_F.ravel('A')
>>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>>
>>>> There are some confusions to get into with the 'order' flag to reshape
>>>> as well, of the same type.
>>>>
>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>>
>>>> This is very confusing.  We think the index ordering and memory
>>>> ordering ideas need to be separated, and specifically, we should avoid
>>>> using "C" and "F" to refer to index ordering.
>>>>
>>>> Proposal
>>>> -------------
>>>>
>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> index ordering for ravel, reshape
>>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> naming idea by Paul Ivanov)
>>>>
>>>> What do y'all think?
>>>>
>>>> Cheers,
>>>>
>>>> Matthew
>>>> Paul Ivanov
>>>> JB Poline
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion at scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>> I always thought "F" and "C" are easy to understand, I always thought about
>>> the content and never about the memory when using it.
>>
>> I can only say that 4 out of 4 experienced numpy developers found
>> themselves unable to predict the behavior of these functions before
>> they saw the output.
>>
>> The problem is always that explaining something makes it clearer for a
>> moment, but, for those who do not have the explanation or who have
>> forgotten it, at least among us here, the outputs were generating
>> groans and / or high fives as we incorrectly or correctly guessed what
>> was going to happen.
>>
>> I think the only way to find out whether this really is confusing or
>> not, is to put someone in front of these functions without any
>> explanation and ask them to predict what is going to come out of the
>> various inputs and flags.   Or to try and teach it, which was the
>> problem we were having.
>
> changing the names doesn't make it easier to understand.
> I think the confusion is because the new A and K refer to existing memory
>
>
> ``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
> don't remember having seen any weird cases.
> ------------
>
> I always thought of "order" in array creation is the way we want to
> have the memory layout of the *target* array and has nothing to do
> with existing memory layout (creating view or copy as needed).

In the case of ravel of course F and C in memory aren't relevant.

'F' and 'C' don't refer to target memory layout at all in 'reshape':

In [26]: a = np.arange(10).reshape((2, 5))

In [28]: a.reshape((2, 5), order='F').flags
Out[28]:
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

So I think that distinction actively confusing in this case, and more
evidence that this is not the right name for what we mean.

Cheers,

Matthew



More information about the NumPy-Discussion mailing list