[Numpy-discussion] Raveling, reshape order keyword unnecessarily confuses index and memory ordering

Chris Barker - NOAA Federal chris.barker at noaa.gov
Wed Apr 3 11:52:47 EDT 2013


On Wed, Apr 3, 2013 at 6:24 AM, Sebastian Berg
<sebastian at sipsolutions.net> wrote:
>> the context where it gets applied. So giving the same strategy two
>> different names is silly; if anything it's the contexts that should
>> have different names.
>>
>
> Yup, thats how I think about it too...

me too...

> But I would really love if someone would try to make the documentation
> simpler!

yes, I think this is where the solution lies.

> There is also never a mention of "contiguity", even though when
> we refer to "memory order", then having a C/F contiguous array is often
> the reason why

good point -- in fact, I have no idea what would happen in many of
these cases for a discontiguous array (or one with arbitrarily weird
strides...)

>  Also 'A' seems often explained not
> quite correctly (though that does not matter (except for reshape, where
> its explanation is fuzzy), it will matter more in the future -- even if
> I don't expect 'A' to be actually used).

I wonder about having a 'A' option in reshape at all -- what the heck
does it mean? why do we need it? Again, I come back to the fact that
memory order is kind-of orthogonal to index order. So for reshape (or
ravel, which is really just a special case of reshape...) the 'A' flag
and 'K' flag (huh?) is pretty dangerous, and prone to error. I think
of it this way:

Much of the beauty of numpy is that it presents a consistent interface
to various forms of strided data -- that way, folks can write code
that works the same way for any ndarray, while still being able to
have internal storage be efficient for the use at hand -- i.e. C order
for the common case, Fortran order for interaction with libraries that
expect that order (or for algorithms that are more efficient in that
order, though that's mostly external libs..), and non-contiguous data
so one can work on sub-parts of arrays without copying data around.

In most places, the numpy API hides the internal memory order -- this
is a good thing, most people have no need to think about it (or most
code, anyway), and you can write code that works (even if not
optimally) for any (strided) memory layout. All is good.

There are times when you really need to understand, or control or
manipulate the memory layout, to make sure your routines are
optimized, or the data is in the right form to pass of to an external
lib, or to make sense of raw data read from a file, or... That's what
we have .view() and friends for.

However, the 'A' and 'K' flags mix and match these concepts -- and I
think that's dangerous. it would be easy for the a to use the 'A'
flag, and have everything work fine and dandy with all their test
cases, only to have it blow up when  someone passes in a
different-than-expected array. So really, they should only be used in
cases where the code has checked memory order before hand, or in a
really well-defined interface where you know exactly what you're
getting. In those cases, it makes the code far more clear an less
error prone to do you re-arranging of the memory in a separate step,
rather than built-in to a ravel() or reshape() call.

[note] -- I wrote earlier that I wasn't confused by the ravel()
examples -- true for teh 'c' and 'F' flags, but I'm still not at all
clear what 'A' and 'K' woudl give me -- particularly for 'A' and
reshape()

So I think the cause of the confusion here is not that we use "order"
in two different contexts, nor the fact that 'C' and 'F' may not mean
anything to some people, but that we are conflating two different
process in one function, and with one flag.

My (maybe) proposal: we deprecate the 'A' and 'K' flags in ravel() and
reshape(). (maybe even deprecate ravel() -- does it add anything to
reshape? If not deprecate, at least encourage people in the docs not
to use them, and rather do their memory-structure manipulations with
.view or stride manipulation, or...

I'm still trying to figure out when you'd want the 'A' flag -- it
seems at the end of your operation you will want:

The resulting array to be a particular shape, with the elements in a
particular order

and

You _may_ want the in-memory layout a certain way.

but 'A' can't ensure both of those.

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov



More information about the NumPy-Discussion mailing list