[Numpy-discussion] flatten() without copy - is this possible?

Anne Archibald peridot.faceted at gmail.com
Sun Jun 3 14:43:27 EDT 2007

On 01/06/07, dmitrey <openopt at ukr.net> wrote:

> y = x.flatten(1)
> turn array into vector (note that this forces a copy)
> Is there any way to do the trick wthout copying?
> What are the problems here? Just other way of array elements indexing...

It is sometimes possible to flatten an array without copying and sometimes not.

For numpy, a vector is a single block of memory in which there are
elements of uniform type spaced at a uniform distance. This last is
the key; it's called the "stride", and it need not be the same size as
an element (so arange(10)[::3] can be created without a copy).

A multidimensional array simply has many strides, one for each
dimension. Thus ones((10,10,10)) simply keeps track of the stride for
a row, the stride for a column, and the stride for a layer. If you
want to transpose two axes, the data is not copied, instead the
strides are simply exchanged. Under normal circumstances one need not
care what the strides are or how the cells are laid out in memory as
numpy hides that from normal users.

What about flattening an array? It should turn an array into a vector,
that is, take an array with n different strides and lengths and create
as single array with a single stride and length. The order of the
resulting elements needs to be specified; numpy normally defaults to
"C order", which means that A[3,4,5] and A[3,4,6] are adjacent in the
resulting array but A[3,4,5] and A[4,4,5] are not. (Note that this is
a logical operation; the organization of the underlying array is
irrelevant for the result.)

If you want to ensure that no copy is made, you need to ensure that
the stride between elements of the array you're flattening is always
the same. Taking a 10-by-10-by-10 array A, the spacing between
A[3,4,5] and A[3,4,6] needs to be the same as the spacing between
A[3,4,6] and A[3,4,7]. This is automatic. But the spacing also needs
to be the same as the spacing between A[3,4,9] and A[3,5,0]. This is
not automatic, and often does not occur. In such cases numpy must make
a copy to ensure that the resulting array is uniformly strided.

What cases *don't* require a copy? Well, let's look at some examples:

A = ones((10,10,10))
reshape(A,(-1,)) # No copy needed
reshape(A[:,:,:5],(-1,)) # Copy needed
reshape(A[:,:,::2],(-1,)) # No copy needed
reshape(A[:,::2,:],(-1,)) # Copy needed
reshape(A[:5,:,:],(-1,)) # No copy needed
reshape(A.transpose(),(-1,)) # Copy needed

Note that none of the reindexing operations require a copy, but some
of the reshapes do.

It turns out to be nontrivial to detect all the cases where a copy can
be avoided while reshaping, and IIRC numpy misses some (old versions
of numpy almost always copied). But a freshly-created array is
normally guaranteed to be reshapable without a copy.

If you want to try reshaping an array without a copy, you can try
assigning to .shape:
In [3]: A = ones((10,10,10))[:,:5,:]

In [4]: A.shape = (-1,)
<type 'exceptions.AttributeError'>        Traceback (most recent call last)

/home/peridot/physics-projects/pulsed-flux/writings/<ipython console>
in <module>()

<type 'exceptions.AttributeError'>: incompatible shape for a
non-contiguous array

In [7]: A = ones((10,10,10))[:5,:,:]

In [8]: A.shape = (-1,)


More information about the NumPy-Discussion mailing list