[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

josef.pktd at gmail.com josef.pktd at gmail.com
Sat Mar 30 22:02:42 EDT 2013


On Sat, Mar 30, 2013 at 8:29 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Sat, Mar 30, 2013 at 7:50 PM,  <josef.pktd at gmail.com> wrote:
>> On Sat, Mar 30, 2013 at 7:31 PM, Bradley M. Froehle
>> <brad.froehle at gmail.com> wrote:
>>> On Sat, Mar 30, 2013 at 3:21 PM, Matthew Brett <matthew.brett at gmail.com>
>>> wrote:
>>>>
>>>> On Sat, Mar 30, 2013 at 2:20 PM,  <josef.pktd at gmail.com> wrote:
>>>> > On Sat, Mar 30, 2013 at 4:57 PM,  <josef.pktd at gmail.com> wrote:
>>>> >> On Sat, Mar 30, 2013 at 3:51 PM, Matthew Brett
>>>> >> <matthew.brett at gmail.com> wrote:
>>>> >>> On Sat, Mar 30, 2013 at 4:14 AM,  <josef.pktd at gmail.com> wrote:
>>>> >>>> On Fri, Mar 29, 2013 at 10:08 PM, Matthew Brett
>>>> >>>> <matthew.brett at gmail.com> wrote:
>>>> >>>>>
>>>> >>>>> Ravel and reshape use the tems 'C' and 'F" in the sense of index
>>>> >>>>> ordering.
>>>> >>>>>
>>>> >>>>> This is very confusing.  We think the index ordering and memory
>>>> >>>>> ordering ideas need to be separated, and specifically, we should
>>>> >>>>> avoid
>>>> >>>>> using "C" and "F" to refer to index ordering.
>>>> >>>>>
>>>> >>>>> Proposal
>>>> >>>>> -------------
>>>> >>>>>
>>>> >>>>> * Deprecate the use of "C" and "F" meaning backwards and forwards
>>>> >>>>> index ordering for ravel, reshape
>>>> >>>>> * Prefer "Z" and "N", being graphical representations of unraveling
>>>> >>>>> in
>>>> >>>>> 2 dimensions, axis1 first and axis0 first respectively (excellent
>>>> >>>>> naming idea by Paul Ivanov)
>>>> >>>>>
>>>> >>>>> What do y'all think?
>>>> >>>>
>>>> >>>> I always thought "F" and "C" are easy to understand, I always thought
>>>> >>>> about
>>>> >>>> the content and never about the memory when using it.
>>>> >>
>>>> >> changing the names doesn't make it easier to understand.
>>>> >> I think the confusion is because the new A and K refer to existing
>>>> >> memory
>>>> >>
>>>>
>>>> I disagree, I think it's confusing, but I have evidence, and that is
>>>> that four out of four of us tested ourselves and got it wrong.
>>>>
>>>> Perhaps we are particularly dumb or poorly informed, but I think it's
>>>> rash to assert there is no problem here.
>>
>> I think you are overcomplicating things or phrased it as a "trick question"
>
> I don't know what you mean by trick question - was there something
> over-complicated in the example?  I deliberately didn't include
> various much more confusing examples in "reshape".

I meant making the "candidates" think about memory instead of just
column versus row stacking.
I don't think I ever get confused about reshape "F" in 2d.
But when I work with 3d or larger ndim nd-arrays, I always have to
try an example to check my intuition (in general not just reshape).

>
>> ravel F and C have *nothing* to do with memory layout.
>
> We do agree on this of course - but you said in an earlier mail that
> you thought of 'C" and 'F' as referring to target memory layout (which
> they don't in this case) so I think we also agree that "C" and "F" do
> often refer to memory layout elsewhere in numpy.

I guess that wasn't so helpful.
(emphasis on *target*, There are very few places where an order
keyword refers to *existing* memory layout.
So I'm not tempted to think about existing memory layout when I see
``order``.

Also my examples might have confused the issue:
ravel and reshape, with C and F are easy to understand without
ever looking at memory issues.

memory only comes into play when we want to know whether we
get a view or copy. The examples were only for the cases when I
do care about this.
)

>
>> I think it's not confusing for beginners that have no idea and never think
>> about memory layout.
>> I've never seen any problems with it in statsmodels and I have seen
>> many developers (GSOC) that are pretty new to python and numpy.
>> (I didn't check the repo history to verify, so IIRC)
>
> Usually you don't need to know what reshape or ravel did because you
> are likely to reshape again and that will use the same algorithm.
>
> For example, I didn't know that that ravel worked in reverse index
> order, started explaining it wrong, and had to check. I use ravel and
> reshape a lot, and have not run into this problem because either a) I
> didn't test my code properly or b) I did reshape after ravel / reshape
> and it reversed what I did first time.  So, I don't think it's "we
> haven't noticed any problems" is a good argument in the face of
> "several experienced developers got it wrong when trying to guess what
> it did".

What's reverse index order?

In the case of statsmodels, we do care about the stacking order. When
we use reshape(..., order='F') or ravel('F'), it's only because we
want to have a
specific array (not memory) layout (and/or because the raveled array came
from R)

(aside:  2 cases
- for 2d parameter vectors, we ravel and reshape often, and we changed
our convention to Fortran order, (parameter in rows, equations in columns, IIRC)
The interpretation of the results depends on which way we ravel or reshape.

- for panel data (time versus individuals), we need to build matching
kronecker product arrays which are block-diagonal if the stacking/``order``
is the right way.

None of the cases cares about memory layout, it's just:
    Do we stack by columns or by rows, i.e. fortran- or c-order?
    Do we want this in rows or in columns?
)


>
>> Even if N, Z were clearer in this case (which I don't think it is and which
>> I have no idea what it should stand for), you would have to go for every
>> use of ``order`` in numpy to check whether it should be N or F or Z or C,
>> and then users would have to check which order name convention is
>> used in a specific function.
>
> Right - and this would be silly if and only if it made sense to
> conflate memory layout and index ordering.

I see the two things, but never saw it as a problem

arr2 = np.asarray(arr1, order='F')
    give me an array with Fortran memory layout, I need it
(never used in statsmodels,
there might be a few places where we used other ways to control
the memory layout, but not much.)

arr2 = arr1.reshape(-1, 5, order='F')
    unstack this array by columns, I want 5 of them
arr1 = arr2.ravel('F')
    go back, stack them again by columns
(used quite a bit as described before)

Cheers,

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list