[Numpy-discussion] C vs. Fortran order -- misleading documentation?

Tue Jun 8 16:20:10 EDT 2010

On Tue, Jun 8, 2010 at 12:05 PM, Anne Archibald
<aarchiba at physics.mcgill.ca>wrote:

> On 8 June 2010 14:16, Eric Firing <efiring at hawaii.edu> wrote:
> > On 06/08/2010 05:50 AM, Charles R Harris wrote:
> >>
> >>
> >> On Tue, Jun 8, 2010 at 9:39 AM, David Goldsmith <
> d.l.goldsmith at gmail.com
> >> <mailto:d.l.goldsmith at gmail.com>> wrote:
> >>
> >>     On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant <MaxPlanck at seznam.cz
> >>     <mailto:MaxPlanck at seznam.cz>> wrote:
> >>
> >>
> >>          > > Correct me if I am wrong, but the paragraph
> >>          > >
> >>          > > Note to those used to IDL or Fortran memory order as it
> >>         relates to
> >>          > > indexing. Numpy uses C-order indexing. That means that the
> >>         last index
> >>          > > usually (see xxx for exceptions) represents the most
> >>         rapidly changing memory
> >>          > > location, unlike Fortran or IDL, where the first index
> >>         represents the most
> >>          > > rapidly changing location in memory. This difference
> >>         represents a great
> >>          > > potential for confusion.
> >>          > >
> >>          > > in
> >>          > >
> >>          > > http://docs.scipy.org/doc/numpy/user/basics.indexing.html
> >>          > >
> >>          > > is quite misleading, as C-order means that the last index
> >>         changes rapidly,
> >>          > > not the
> >>          > > memory location.
> >>          > >
> >>          > >
> >>          > Any index can change rapidly, depending on whether is in an
> >>         inner loop or
> >>          > not. The important distinction between C and Fortran order is
> >>         how indices
> >>          > translate to memory locations. The documentation seems
> >>         correct to me,
> >>          > although it might make more sense to say the last index
> >>         addresses a
> >>          > contiguous range of memory. Of course, with modern
> >>         processors, actual
> >>          > physical memory can be mapped all over the place.
> >>          >
> >>          > Chuck
> >>
> >>         To me, saying that the last index represents the most rapidly
> >>         changing memory
> >>         location means that if I change the last index, the memory
> >>         location changes
> >>         a lot, which is not true for C-order. So for C-order, supposed
> >>         one scans the memory
> >>         linearly (the desired scenario),  it is the last *index* that
> >>         changes most rapidly.
> >>
> >>         The inverted picture looks like this: For C-order,  changing the
> >>         first index
> >>         leads to the most rapid jump in *memory*.
> >>
> >>         Still have the feeling the doc is very misleading at this
> >>         important issue.
> >>
> >>         Pavel
> >>
> >>
> >>     The distinction between your two perspectives is that one is using
> >>     for-loop traversal of indices, the other is using pointer-increment
> >>     traversal of memory; from each of your perspectives, your
> >>     conclusions are "correct," but my inclination is that the
> >>     pointer-increment traversal of memory perspective is closer to the
> >>     "spirit" of the docstring, no?
> >>
> >>
> >> I think the confusion is in "most rapidly changing memory location",
> >> which is kind of ambiguous because a change in the indices is always a
> >> change in memory location if one hasn't used index tricks and such. So
> >> from a time perspective it means nothing, while from a memory
> >> perspective the largest address changes come from the leftmost indices.
> >
> > Exactly.  Rate of change with respect to what, or as you do what?
> >
> > I suggest something like the following wording, if you don't mind the
> > verbosity as a means of conjuring up an image (although putting in
> > diagrams would make it even clearer--undoubtedly there are already good
> > illustrations somewhere on the web):
> >
> > ------------
> >
> > Note to those used to Matlab, IDL, or Fortran memory order as it relates
> > to indexing. Numpy uses C-order indexing by default, although a numpy
> > array can be designated as using Fortran order. [With C-order,
> > sequential memory locations are accessed by incrementing the last
> > index.]  For a two-dimensional array, think if it as a table.  With
> > C-order indexing the table is stored as a series of rows, so that one is
> > reading from left to right, incrementing the column (last) index, and
> > jumping ahead in memory to the next row by incrementing the row (first)
> > index. With Fortran order, the table is stored as a series of columns,
> > so one reads memory sequentially from top to bottom, incrementing the
> > first index, and jumps ahead in memory to the next column by
> > incrementing the last index.
> >
> > One more difference to be aware of: numpy, like python and C, uses
> > zero-based indexing; Matlab, [IDL???], and Fortran start from one.
> >
> > -----------------
> >
> > If you want to keep it short, the key wording is in the sentence in
> > brackets, and you can chop out the table illustration.
>
> I'd just like to point out a few warnings to keep in mind while
> rewriting this section:
>
> Numpy arrays can have any configuration of memory strides, including
> some that are zero; C and Fortran contiguous arrays are simply those
> that have special arrangements of the strides. The actual stride
> values is normally almost irrelevant to python code.
>
> There is a second meaning of C and Fortran order: when you are
> reshaping an array, you can specify one order or the other. The
> reshaping operation then behaves logically as if the input and output
> arrays are in the requested order, regardless of what the actual
> memory layout is.
>
> Perhaps one could rewrite the text as something like:
>
> Different programming languages have different ways of laying out
> multidimensional arrays. In C, such an array must be a contiguous
> block of memory, and as one advances through that memory the last
> index changes most rapidly, with the earlier indices increasing only
> when the later indices have reached the ends of their respective
> dimensions. Fortran arrays must also be contiguous blocks of memory,
> but in their case it is the first index that changes most rapidly as
> one advances through the block. Numpy arrays may have many different
> memory layouts, represented by a 'stride' for each dimension, but new
> numpy arrays are by default constructed in the C order. They can also
> readily be constructed directly in Fortran order by passing additional
> arguments to many array-construction functions. Most code using numpy
> need not concern itself with the memory layout of the arrays it uses,
> but conversion functions are available should a user need to, say,
> pass a multidimensional array to a function written in C or Fortran.
>
> Anne
>
> > Eric
>

So far I don't think any of these address Chuck's original comment about
being able to access memory in any order whatsoever using for loops (i.e.,
in a 3-D array, one could increment the middle index most frequently)
rendering all of the above less relevant (not incorrect, of course, just
less relevant).  This is why I still see the issue as related to the manner
in which one is "scrolling" through the array, and the "incrementing the
pointer" idea as the one more relevant to the document in question.

That said, perhaps this modification of Anne's wording is sufficient (my
changes enclosed by underscores):

Different programming languages have different ways of laying out
multidimensional arrays. In C, such an array must be a contiguous
block of memory, and as one advances through that memory _sequentially_, the
last
index changes most rapidly, with the earlier indices increasing only
when the later indices have reached the ends of their respective
dimensions _(like an automobile's odometer)_. Fortran arrays must also be
contiguous blocks of memory,
but in their case it is the first index that changes most rapidly as
one advances through the block _sequentially (the opposite of an automobile
odometer)_. Numpy arrays may have many different
memory layouts, represented by a 'stride' for each dimension, but _*new
numpy arrays are by default constructed in the C order_* [emphasis to be
added]. They can also
readily be constructed directly in Fortran order by passing additional
arguments to many array-construction functions. Most code using numpy
need not concern itself with the memory layout of the arrays it uses,
but conversion functions are available should a user need to, say,
pass a multidimensional array to a function written in C or Fortran.

DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100608/d606c82d/attachment.html>