[Numpy-discussion] C vs. Fortran order -- misleading documentation?

Tue Jun 8 14:16:15 EDT 2010

On 06/08/2010 05:50 AM, Charles R Harris wrote:
>
>
> On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith <d.l.goldsmith at gmail.com
> <mailto:d.l.goldsmith at gmail.com>> wrote:
>
>     On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant <MaxPlanck at seznam.cz
>     <mailto:MaxPlanck at seznam.cz>> wrote:
>
>
>          > > Correct me if I am wrong, but the paragraph
>          > >
>          > > Note to those used to IDL or Fortran memory order as it
>         relates to
>          > > indexing. Numpy uses C-order indexing. That means that the
>         last index
>          > > usually (see xxx for exceptions) represents the most
>         rapidly changing memory
>          > > location, unlike Fortran or IDL, where the first index
>         represents the most
>          > > rapidly changing location in memory. This difference
>         represents a great
>          > > potential for confusion.
>          > >
>          > > in
>          > >
>          > > http://docs.scipy.org/doc/numpy/user/basics.indexing.html
>          > >
>          > > is quite misleading, as C-order means that the last index
>         changes rapidly,
>          > > not the
>          > > memory location.
>          > >
>          > >
>          > Any index can change rapidly, depending on whether is in an
>         inner loop or
>          > not. The important distinction between C and Fortran order is
>         how indices
>          > translate to memory locations. The documentation seems
>         correct to me,
>          > although it might make more sense to say the last index
>         addresses a
>          > contiguous range of memory. Of course, with modern
>         processors, actual
>          > physical memory can be mapped all over the place.
>          >
>          > Chuck
>
>         To me, saying that the last index represents the most rapidly
>         changing memory
>         location means that if I change the last index, the memory
>         location changes
>         a lot, which is not true for C-order. So for C-order, supposed
>         one scans the memory
>         linearly (the desired scenario),  it is the last *index* that
>         changes most rapidly.
>
>         The inverted picture looks like this: For C-order,  changing the
>         first index
>         leads to the most rapid jump in *memory*.
>
>         Still have the feeling the doc is very misleading at this
>         important issue.
>
>         Pavel
>
>
>     The distinction between your two perspectives is that one is using
>     for-loop traversal of indices, the other is using pointer-increment
>     traversal of memory; from each of your perspectives, your
>     conclusions are "correct," but my inclination is that the
>     pointer-increment traversal of memory perspective is closer to the
>     "spirit" of the docstring, no?
>
>
> I think the confusion is in "most rapidly changing memory location",
> which is kind of ambiguous because a change in the indices is always a
> change in memory location if one hasn't used index tricks and such. So
> from a time perspective it means nothing, while from a memory
> perspective the largest address changes come from the leftmost indices.

Exactly.  Rate of change with respect to what, or as you do what?

I suggest something like the following wording, if you don't mind the 
verbosity as a means of conjuring up an image (although putting in 
diagrams would make it even clearer--undoubtedly there are already good 
illustrations somewhere on the web):

------------

Note to those used to Matlab, IDL, or Fortran memory order as it relates 
to indexing. Numpy uses C-order indexing by default, although a numpy 
array can be designated as using Fortran order. [With C-order, 
sequential memory locations are accessed by incrementing the last 
index.]  For a two-dimensional array, think if it as a table.  With 
C-order indexing the table is stored as a series of rows, so that one is 
reading from left to right, incrementing the column (last) index, and 
jumping ahead in memory to the next row by incrementing the row (first) 
index. With Fortran order, the table is stored as a series of columns, 
so one reads memory sequentially from top to bottom, incrementing the 
first index, and jumps ahead in memory to the next column by 
incrementing the last index.

One more difference to be aware of: numpy, like python and C, uses 
zero-based indexing; Matlab, [IDL???], and Fortran start from one.

-----------------

If you want to keep it short, the key wording is in the sentence in 
brackets, and you can chop out the table illustration.

Eric

>
> Chuck
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion