[Numpy-discussion] C vs. Fortran order -- misleading documentation?

Tue Jun 8 15:05:57 EDT 2010

On 8 June 2010 14:16, Eric Firing <efiring at hawaii.edu> wrote:
> On 06/08/2010 05:50 AM, Charles R Harris wrote:
>>
>>
>> On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith <d.l.goldsmith at gmail.com
>> <mailto:d.l.goldsmith at gmail.com>> wrote:
>>
>>     On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant <MaxPlanck at seznam.cz
>>     <mailto:MaxPlanck at seznam.cz>> wrote:
>>
>>
>>          > > Correct me if I am wrong, but the paragraph
>>          > >
>>          > > Note to those used to IDL or Fortran memory order as it
>>         relates to
>>          > > indexing. Numpy uses C-order indexing. That means that the
>>         last index
>>          > > usually (see xxx for exceptions) represents the most
>>         rapidly changing memory
>>          > > location, unlike Fortran or IDL, where the first index
>>         represents the most
>>          > > rapidly changing location in memory. This difference
>>         represents a great
>>          > > potential for confusion.
>>          > >
>>          > > in
>>          > >
>>          > > http://docs.scipy.org/doc/numpy/user/basics.indexing.html
>>          > >
>>          > > is quite misleading, as C-order means that the last index
>>         changes rapidly,
>>          > > not the
>>          > > memory location.
>>          > >
>>          > >
>>          > Any index can change rapidly, depending on whether is in an
>>         inner loop or
>>          > not. The important distinction between C and Fortran order is
>>         how indices
>>          > translate to memory locations. The documentation seems
>>         correct to me,
>>          > although it might make more sense to say the last index
>>         addresses a
>>          > contiguous range of memory. Of course, with modern
>>         processors, actual
>>          > physical memory can be mapped all over the place.
>>          >
>>          > Chuck
>>
>>         To me, saying that the last index represents the most rapidly
>>         changing memory
>>         location means that if I change the last index, the memory
>>         location changes
>>         a lot, which is not true for C-order. So for C-order, supposed
>>         one scans the memory
>>         linearly (the desired scenario),  it is the last *index* that
>>         changes most rapidly.
>>
>>         The inverted picture looks like this: For C-order,  changing the
>>         first index
>>         leads to the most rapid jump in *memory*.
>>
>>         Still have the feeling the doc is very misleading at this
>>         important issue.
>>
>>         Pavel
>>
>>
>>     The distinction between your two perspectives is that one is using
>>     for-loop traversal of indices, the other is using pointer-increment
>>     traversal of memory; from each of your perspectives, your
>>     conclusions are "correct," but my inclination is that the
>>     pointer-increment traversal of memory perspective is closer to the
>>     "spirit" of the docstring, no?
>>
>>
>> I think the confusion is in "most rapidly changing memory location",
>> which is kind of ambiguous because a change in the indices is always a
>> change in memory location if one hasn't used index tricks and such. So
>> from a time perspective it means nothing, while from a memory
>> perspective the largest address changes come from the leftmost indices.
>
> Exactly.  Rate of change with respect to what, or as you do what?
>
> I suggest something like the following wording, if you don't mind the
> verbosity as a means of conjuring up an image (although putting in
> diagrams would make it even clearer--undoubtedly there are already good
> illustrations somewhere on the web):
>
> ------------
>
> Note to those used to Matlab, IDL, or Fortran memory order as it relates
> to indexing. Numpy uses C-order indexing by default, although a numpy
> array can be designated as using Fortran order. [With C-order,
> sequential memory locations are accessed by incrementing the last
> index.]  For a two-dimensional array, think if it as a table.  With
> C-order indexing the table is stored as a series of rows, so that one is
> reading from left to right, incrementing the column (last) index, and
> jumping ahead in memory to the next row by incrementing the row (first)
> index. With Fortran order, the table is stored as a series of columns,
> so one reads memory sequentially from top to bottom, incrementing the
> first index, and jumps ahead in memory to the next column by
> incrementing the last index.
>
> One more difference to be aware of: numpy, like python and C, uses
> zero-based indexing; Matlab, [IDL???], and Fortran start from one.
>
> -----------------
>
> If you want to keep it short, the key wording is in the sentence in
> brackets, and you can chop out the table illustration.

I'd just like to point out a few warnings to keep in mind while
rewriting this section:

Numpy arrays can have any configuration of memory strides, including
some that are zero; C and Fortran contiguous arrays are simply those
that have special arrangements of the strides. The actual stride
values is normally almost irrelevant to python code.

There is a second meaning of C and Fortran order: when you are
reshaping an array, you can specify one order or the other. The
reshaping operation then behaves logically as if the input and output
arrays are in the requested order, regardless of what the actual
memory layout is.

Perhaps one could rewrite the text as something like:

Different programming languages have different ways of laying out
multidimensional arrays. In C, such an array must be a contiguous
block of memory, and as one advances through that memory the last
index changes most rapidly, with the earlier indices increasing only
when the later indices have reached the ends of their respective
dimensions. Fortran arrays must also be contiguous blocks of memory,
but in their case it is the first index that changes most rapidly as
one advances through the block. Numpy arrays may have many different
memory layouts, represented by a 'stride' for each dimension, but new
numpy arrays are by default constructed in the C order. They can also
readily be constructed directly in Fortran order by passing additional
arguments to many array-construction functions. Most code using numpy
need not concern itself with the memory layout of the arrays it uses,
but conversion functions are available should a user need to, say,
pass a multidimensional array to a function written in C or Fortran.

Anne

> Eric
>
>
>>
>> Chuck
>>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>