[Numpy-discussion] Change default order to Fortran order

Jason Newton nevion at gmail.com
Sun Aug 2 22:55:53 EDT 2015


Just chiming in with my 2 cents, in direct response to your points...

   - Image oriented processing is most typically done with row-major
   storage layout.  From hardware to general software implementations.
   - Well really think of it as [slice,] row, column (logical)... you don't
   actually have to be concerned about the layout unless you want higher
   performance - in which case for a better access pattern you process a
   fundamental image-line at a time.  I also find it helps me avoid bugs with
   xyz semantics by working with rows and columns only and remembering x=col,
   y = row.
   - I'm most familiar with having slice first like the above.
   - ITK is stored as row-major actually, but it's index type has
   dimensions specified as column,row, slice .  Matlab does alot of things
   column order and thus acts different from implementations which can result
   in different outputs, but matlab seems perfectly happy living on an island
   where it's the only implementation providing a specific answer given a
   specific input.
   - Numpy is 0 based...?

Good luck keeping it all sane though,

-Jason

On Sun, Aug 2, 2015 at 7:16 PM, Kang Wang <kwang24 at wisc.edu> wrote:

> Thank you all for replying and providing useful insights and suggestions.
>
> The reasons I really want to use column-major are:
>
>    - I am image-oriented user (not matrix-oriented, as explained in
>    http://docs.scipy.org/doc/numpy/reference/internals.html#multidimensional-array-indexing-order-issues
>    )
>    - I am so used to read/write "I(x, y, z)" in textbook and code, and it
>    is very likely that if the environment (row-major environment) forces me to
>    write I(z, y, x),  I will write a bug if I am not 100% focused. When this
>    happens, it is difficult to debug, because everything compile and build
>    fine. You will see run time error. Depending on environment, you may get
>    useful error message (i.e. index out of range), but sometimes you just get
>    bad image results.
>    - It actually has not too much to do with the actual data layout in
>    memory. In imaging processing, especially medical imaging where I am
>    working in, if you have a 3D image, everyone will agree that in memory, the
>    X index is the fasted changing index, and the Z dimension (we often call it
>    the "slice" dimension) has the largest stride in memory. So, if
>    data layout is like this in memory, and image-oriented users are so used to
>    read/write "I(x,y,z)", the only storage order that makes sense is
>    column-major
>    - I also write code in MATLAB and C/C++. In MATLAB, matrix is
>    column-major array. In C/C++, we often use ITK, which is also column-major (
>    http://www.itk.org/Doxygen/html/classitk_1_1Image.html). I really
>    prefer always read/write column-major code to minimize coding bugs related
>    to storage order.
>    - I also prefer index to be 0-based; however, there is nothing I can
>    do about it for MATLAB (which is 1-based).
>
> I can see that my original thought about "modifying NumPy source and
> re-compile" is probably a bad idea. The suggestions about using
> "fortran_zeros = partial(np.zeros(order='F'))" is probably the best way so
> far, in my opinion, and I am going to give it a try.
>
> Again, thank you all for replying.
>
> Kang
>
> On 08/02/15, *Nathaniel Smith * <njs at pobox.com> wrote:
>
> On Aug 2, 2015 7:30 AM, "Sturla Molden" <sturla.molden at gmail.com> wrote:
> >
> > On 02/08/15 15:55, Kang Wang wrote:
> >
> > > Can anyone provide any insight/help?
> >
> > There is no "default order". There was before, but now all operators
> > control the order of their return arrays from the order of their input
> > array.
>
> This is... overoptimistic. I would not rely on this in code that I wrote.
>
> It's true that many numpy operations do preserve the input order. But
> there are also many operations that don't. And which are which often
> changes between releases. (Not on purpose usually, but it's an easy bug to
> introduce. And sometimes it is done intentionally, e.g. to make functions
> faster. It sucks to have to make a function slower for everyone because
> someone somewhere is depending on memory layout default details.) And there
> are operations where it's not even clear what preserving order means
> (indexing a C array with a Fortran array, add(C, fortran), ...), and even
> lots of operations that intrinsically break contiguity/ordering (transpose,
> diagonal, slicing, swapaxes, ...), so you will end up with mixed orderings
> one way or another in any non-trivial program.
>
> Instead, it's better to explicitly specify order= just at the places where
> you care. That way your code is *locally* correct (you can check it will
> work by just reading the one function). The alternative is to try and
> enforce a *global* property on your program ("everyone everywhere is very
> careful to only use contiguity-preserving operations", where "everyone"
> includes third party libraries like numpy and others). In software design,
> local invariants invariants are always better than global invariants -- the
> most well known example is local variables versus global variables, but the
> principle is much broader.
>
> -n
>
> --
> *Kang Wang, Ph.D.*
> 1111 Highland Ave., Room 1113
> Madison, WI 53705-2275
> ----------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150802/68191f12/attachment.html>


More information about the NumPy-Discussion mailing list