[Numpy-discussion] C vs. Fortran order -- misleading documentation?

Tue Jun 8 17:34:09 EDT 2010

On 8 June 2010 17:17, David Goldsmith <d.l.goldsmith at gmail.com> wrote:
> On Tue, Jun 8, 2010 at 1:56 PM, Benjamin Root <ben.root at ou.edu> wrote:
>>
>> On Tue, Jun 8, 2010 at 1:36 PM, Eric Firing <efiring at hawaii.edu> wrote:
>>>
>>> On 06/08/2010 08:16 AM, Eric Firing wrote:
>>> > On 06/08/2010 05:50 AM, Charles R Harris wrote:
>>> >>
>>> >> On Tue, Jun 8, 2010 at 9:39 AM, David
>>> >> Goldsmith<d.l.goldsmith at gmail.com
>>> >> <mailto:d.l.goldsmith at gmail.com>>  wrote:
>>> >>
>>> >>      On Tue, Jun 8, 2010 at 8:27 AM, Pavel Bazant<MaxPlanck at seznam.cz
>>> >>      <mailto:MaxPlanck at seznam.cz>>  wrote:
>>> >>
>>> >>           >  >  Correct me if I am wrong, but the paragraph
>>> >>           >  >
>>> >>           >  >  Note to those used to IDL or Fortran memory order as
>>> >> it
>>> >>          relates to
>>> >>           >  >  indexing. Numpy uses C-order indexing. That means that
>>> >> the
>>> >>          last index
>>> >>           >  >  usually (see xxx for exceptions) represents the most
>>> >>          rapidly changing memory
>>> >>           >  >  location, unlike Fortran or IDL, where the first index
>>> >>          represents the most
>>> >>           >  >  rapidly changing location in memory. This difference
>>> >>          represents a great
>>> >>           >  >  potential for confusion.
>>> >>           >  >
>>> >>           >  >  in
>>> >>           >  >
>>> >>           >  >
>>> >>  http://docs.scipy.org/doc/numpy/user/basics.indexing.html
>>> >>           >  >
>>> >>           >  >  is quite misleading, as C-order means that the last
>>> >> index
>>> >>          changes rapidly,
>>> >>           >  >  not the
>>> >>           >  >  memory location.
>>> >>           >  >
>>> >>           >  >
>>> >>           >  Any index can change rapidly, depending on whether is in
>>> >> an
>>> >>          inner loop or
>>> >>           >  not. The important distinction between C and Fortran
>>> >> order is
>>> >>          how indices
>>> >>           >  translate to memory locations. The documentation seems
>>> >>          correct to me,
>>> >>           >  although it might make more sense to say the last index
>>> >>          addresses a
>>> >>           >  contiguous range of memory. Of course, with modern
>>> >>          processors, actual
>>> >>           >  physical memory can be mapped all over the place.
>>> >>           >
>>> >>           >  Chuck
>>> >>
>>> >>          To me, saying that the last index represents the most rapidly
>>> >>          changing memory
>>> >>          location means that if I change the last index, the memory
>>> >>          location changes
>>> >>          a lot, which is not true for C-order. So for C-order,
>>> >> supposed
>>> >>          one scans the memory
>>> >>          linearly (the desired scenario),  it is the last *index* that
>>> >>          changes most rapidly.
>>> >>
>>> >>          The inverted picture looks like this: For C-order,  changing
>>> >> the
>>> >>          first index
>>> >>          leads to the most rapid jump in *memory*.
>>> >>
>>> >>          Still have the feeling the doc is very misleading at this
>>> >>          important issue.
>>> >>
>>> >>          Pavel
>>> >>
>>> >>
>>> >>      The distinction between your two perspectives is that one is
>>> >> using
>>> >>      for-loop traversal of indices, the other is using
>>> >> pointer-increment
>>> >>      traversal of memory; from each of your perspectives, your
>>> >>      conclusions are "correct," but my inclination is that the
>>> >>      pointer-increment traversal of memory perspective is closer to
>>> >> the
>>> >>      "spirit" of the docstring, no?
>>> >>
>>> >>
>>> >> I think the confusion is in "most rapidly changing memory location",
>>> >> which is kind of ambiguous because a change in the indices is always a
>>> >> change in memory location if one hasn't used index tricks and such. So
>>> >> from a time perspective it means nothing, while from a memory
>>> >> perspective the largest address changes come from the leftmost
>>> >> indices.
>>> >
>>> > Exactly.  Rate of change with respect to what, or as you do what?
>>> >
>>> > I suggest something like the following wording, if you don't mind the
>>> > verbosity as a means of conjuring up an image (although putting in
>>> > diagrams would make it even clearer--undoubtedly there are already good
>>> > illustrations somewhere on the web):
>>> >
>>> > ------------
>>> >
>>> > Note to those used to Matlab, IDL, or Fortran memory order as it
>>> > relates
>>> > to indexing. Numpy uses C-order indexing by default, although a numpy
>>> > array can be designated as using Fortran order. [With C-order,
>>> > sequential memory locations are accessed by incrementing the last
>>>
>>> Maybe change "sequential" to "contiguous".
>>>
>> I was thinking maybe "subsequent" might be a better word.
>
> IMV, contiguous has more of a "physical" connotation.  (That just isn't
> valid in Numpy, correct?)  So I'd prefer subsequent as an alternative to
> sequential.
>>
>> In the end, we need to communicate this clearly.  No matter which
>> language, I have always found it difficult to get new programmers to
>> understand the importance of knowing the difference between row-major and
>> column-major.  A "thick" paragraph isn't going to help to get the idea
>> across to a person who doesn't even know that a problem exists.
>>
>> Maybe a car analogy would be good here...
>>
>> Maybe if one imagine city streets (where many of the streets are one-way),
>> and need to drop off mail at each address.  Would it be more efficient to go
>> up and back a street or to drop off mail at the first address of the street
>> and then move on to the first address of the next street?
>
> But the issue isn't one of efficiency, it's merely an arbitrarily chosen
> convention.  (Does anyone know the history of the choices for FORTRAN and C,
> esp. why K&R chose the opposite of what was already in common usage in
> FORTRAN?  Just curious?)

This is speculation, not knowledge, but it's worth pointing out that
there are actually two ways to represent a multidimensional array in
C: as a block of memory with appropriate type definitions, or as an
array of pointers to subarrays. This latter approach is generally not
used for numerical work, but is potentially useful for other
applications. More relevantly, it already has a natural syntax;
a[2][3][5] naturally follows the chain of pointers and gives you what
you want; it also forces your last index to change most rapidly as you
walk through memory. So it would be very odd if multidimensional
arrays defined without pointers but using the same syntax were indexed
the other way around. (Let's ignore abominations like 5[3[2[a]]].)

Anne

> Does the OP have an opinion on the various alternatives offered so far?
>
> DG
>>
>> Just my two cents...
>>
>> Ben Root
>>
>>>
>>> > index.]  For a two-dimensional array, think if it as a table.  With
>>> > C-order indexing the table is stored as a series of rows, so that one
>>> > is
>>> > reading from left to right, incrementing the column (last) index, and
>>> > jumping ahead in memory to the next row by incrementing the row (first)
>>> > index. With Fortran order, the table is stored as a series of columns,
>>> > so one reads memory sequentially from top to bottom, incrementing the
>>> > first index, and jumps ahead in memory to the next column by
>>> > incrementing the last index.
>>> >
>>> > One more difference to be aware of: numpy, like python and C, uses
>>> > zero-based indexing; Matlab, [IDL???], and Fortran start from one.
>>> >
>>> > -----------------
>>> >
>>> > If you want to keep it short, the key wording is in the sentence in
>>> > brackets, and you can chop out the table illustration.
>>> >
>>> > Eric
>>> >
>>> >
>>> >>
>>> >> Chuck
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> NumPy-Discussion mailing list
>>> >> NumPy-Discussion at scipy.org
>>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at scipy.org
>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
> Mathematician: noun, someone who disavows certainty when their uncertainty
> set is non-empty, even if that set has measure zero.
>
> Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
> lies, prevents mankind from committing a general suicide.  (As interpreted
> by Robert Graves)
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>