[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Sat Mar 30 16:57:36 EDT 2013

On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> We were teaching today, and found ourselves getting very confused
>>> about ravel and shape in numpy.
>>>
>>> Summary
>>> --------------
>>>
>>> There are two separate ideas needed to understand ordering in ravel and reshape:
>>>
>>> Idea 1): ravel / reshape can proceed from the last axis to the first,
>>> or the first to the last.  This is "ravel index ordering"
>>> Idea 2) The physical layout of the array (on disk or in memory) can be
>>> "C" or "F" contiguous or neither.
>>> This is "memory ordering"
>>>
>>> The index ordering is usually (but see below) orthogonal to the memory ordering.
>>>
>>> The 'ravel' and 'reshape' commands use "C" and "F" in the sense of
>>> index ordering, and this mixes the two ideas and is confusing.
>>>
>>> What the current situation looks like
>>> ----------------------------------------------------
>>>
>>> Specifically, we've been rolling this around 4 experienced numpy users
>>> and we all predicted at least one of the results below wrongly.
>>>
>>> This was what we knew, or should have known:
>>>
>>> In [2]: import numpy as np
>>>
>>> In [3]: arr = np.arange(10).reshape((2, 5))
>>>
>>> In [5]: arr.ravel()
>>> Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> So, the 'ravel' operation unravels over the last axis (1) first,
>>> followed by axis 0.
>>>
>>> So far so good (even if the opposite to MATLAB, Octave).
>>>
>>> Then we found the 'order' flag to ravel:
>>>
>>> In [10]: arr.flags
>>> Out[10]:
>>>   C_CONTIGUOUS : True
>>>   F_CONTIGUOUS : False
>>>   OWNDATA : False
>>>   WRITEABLE : True
>>>   ALIGNED : True
>>>   UPDATEIFCOPY : False
>>>
>>> In [11]: arr.ravel('C')
>>> Out[11]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> But we soon got confused.  How about this?
>>>
>>> In [12]: arr_F = np.array(arr, order='F')
>>>
>>> In [13]: arr_F.flags
>>> Out[13]:
>>>   C_CONTIGUOUS : False
>>>   F_CONTIGUOUS : True
>>>   OWNDATA : True
>>>   WRITEABLE : True
>>>   ALIGNED : True
>>>   UPDATEIFCOPY : False
>>>
>>> In [16]: arr_F
>>> Out[16]:
>>> array([[0, 1, 2, 3, 4],
>>>        [5, 6, 7, 8, 9]])
>>>
>>> In [17]: arr_F.ravel('C')
>>> Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> Right - so the flag 'C' to ravel, has got nothing to do with *memory*
>>> ordering, but is to do with *index* ordering.
>>>
>>> And in fact, we can ask for memory ordering specifically:
>>>
>>> In [22]: arr.ravel('K')
>>> Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> In [23]: arr_F.ravel('K')
>>> Out[23]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>
>>> In [24]: arr.ravel('A')
>>> Out[24]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
>>> In [25]: arr_F.ravel('A')
>>> Out[25]: array([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])
>>>
>>> There are some confusions to get into with the 'order' flag to reshape
>>> as well, of the same type.
>>>
>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index ordering.
>>>
>>> This is very confusing.  We think the index ordering and memory
>>> ordering ideas need to be separated, and specifically, we should avoid
>>> using "C" and "F" to refer to index ordering.
>>>
>>> Proposal
>>> -------------
>>>
>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>> index ordering for ravel, reshape
>>> * Prefer "Z" and "N", being graphical representations of unraveling in
>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>> naming idea by Paul Ivanov)
>>>
>>> What do y'all think?
>>>
>>> Cheers,
>>>
>>> Matthew
>>> Paul Ivanov
>>> JB Poline
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
>> I always thought "F" and "C" are easy to understand, I always thought about
>> the content and never about the memory when using it.
>
> I can only say that 4 out of 4 experienced numpy developers found
> themselves unable to predict the behavior of these functions before
> they saw the output.
>
> The problem is always that explaining something makes it clearer for a
> moment, but, for those who do not have the explanation or who have
> forgotten it, at least among us here, the outputs were generating
> groans and / or high fives as we incorrectly or correctly guessed what
> was going to happen.
>
> I think the only way to find out whether this really is confusing or
> not, is to put someone in front of these functions without any
> explanation and ask them to predict what is going to come out of the
> various inputs and flags.   Or to try and teach it, which was the
> problem we were having.

changing the names doesn't make it easier to understand.
I think the confusion is because the new A and K refer to existing memory

``ravel`` is just stacking columns ('F') or stacking rows ('C'), I
don't remember having seen any weird cases.
------------

I always thought of "order" in array creation is the way we want to
have the memory layout of the *target* array and has nothing to do
with existing memory layout (creating view or copy as needed).

reshape, and ravel are *views* if possible, memory might just be some
weird strides
(and can be ignored unless you want to do some memory optimization,
keeping track of the memory is difficult.
I don't think I will start to use A and K after upgrading numpy.)

>>> a1 = np.ones((10,4))

not contiguous

>>> arr2 = a1[:, 2:4]
>>> arr2.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

stack columns (needs to make a copy)

>>> arr3 = arr2.ravel('F')
>>> arr3.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

stack columns or rows with reshape

(I have no idea what it did with the memory)

>>> arr2.reshape(-1,1).flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> arr2.reshape(-1,1, order='F').flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

>>> arr2.reshape(-1, order='F').flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

-------------------

one case where I do pay attention to memory layout is column slicing

>>> arr = np.ones((10, 5), order='F')
>>> for i in range(1, 5): print arr[:, :i+2].ravel('C').flags['OWNDATA']
???
>>> for i in range(1,5): print arr[:, :i+2].ravel('F').flags['OWNDATA']
???

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion