The NumPy Fortran-ordering quiz

A. M. Archibald peridot.faceted at gmail.com
Wed Oct 18 10:55:26 EDT 2006


On 18/10/06, Travis Oliphant <oliphant.travis at ieee.org> wrote:
>
> I just committed a change to the check-for-copy code to SVN.   A copy
> will occur if an actual reshaping is taking place that involves more
> than just adding or deleting ones from the old shape and if
>
> 1) self is not "single-segment".  Single-segment means either
> Fortran-order or C-order.  If we don't have a single-segment array, then
> we have to get a (Fortran or C-) contiguous buffer to do any reshaping
> (there may be special cases when it's possible, but I haven't figure
> them out...)

Indeed it is often possible to reshape an array that is not
contiguous; for example zeros(100)[::10] is not contiguous but can be
reshaped to any shape without any copying.

More generally, there are all sorts of peculiar arrangements that can
be reshaped witout a copy, for example:

input has lengths (8,2,3) and strides (39,9,3); output should have
lengths (4,4,3,2), so take strides (156,39,6,3)

Whether this sort of thing actually *occurs* very often, I don't know...

> 2) self is single-segment but self.squeeze().ndim > 1 and it is in the
> "wrong" order from what was requested in the order argument (i.e. self
> is in C-contiguous order but a Fortran-order reshape was requested or
> self is in F-contiguous order but a C-order reshape was requested).
>
> If there are any cases satisfying these rules where a copy does not have
> to occur then let me know.

There are. I came up with a general condition for describing exactly
when you need to copy an array, but it's difficult to put into words.
Here goes:

(0) Dimensions of length 1 are an irrelevant annoyance. Conceptually
they should be squeezed out before beginning and newaxised back in at
the end. (What strides should they have?) Assume that I've done so
from now on. (Actual code can just skip them appropriately.)

(1) If the array looks like an evenly-strided 1D array that has been
reshaped (so strides[i+-1]=lengths[i]*strides[i] for all i) it can
always be reshaped without a copy. (For example, zeros(1000)[::10] can
be reshaped arbitrarily and as many times as desired without copying.)

(2) If the array does *not* look like an evenly-strided 1D array that
has been reshaped, you can still reshape it without a copy if it looks
like an array of such arrays, each of which can be reshaped
separately. (For example, you can view an array of length (8,2,3) that
you have to resize to an array of size (4,4,3,2) as two separate
pieces: an array of size (2,3) that you have to resize to (3,2) and an
array of size (8,) that you need to resize to size (4,4). Each of
these pieces separately needs to look like a reshaped 1d array, but
there need be no relation between them.) (The condition for when you
can pull a smaller resize operation off the end is precisely when you
can find a tail segment of each tuple that has the same product; in
the case above, 2*3=3*2, so we could stop and do that reshape
separately.)

(3) If the array cannot be reshaped without a copy using the rules
above, it cannot be reshaped without a copy at all.


The python code in my previous message actually implements this rule,
if you want a less vague description.

I should also point out that views are not always more efficient than
copies (because of cache concerns, and because a small view can
prevent a giant block of memory from being deallocated); nevertheless
it should be up to the user to copy when it helps.

A. M. Archibald

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642




More information about the NumPy-Discussion mailing list