flatten() without copy - is this possible?
hi all. in the numpy for matlab users I read y = x.flatten(1) turn array into vector (note that this forces a copy) Is there any way to do the trick wthout copying? What are the problems here? Just other way of array elements indexing... Thx, D.
On 01/06/07, dmitrey
y = x.flatten(1)
turn array into vector (note that this forces a copy)
Is there any way to do the trick wthout copying? What are the problems here? Just other way of array elements indexing...
It is sometimes possible to flatten an array without copying and sometimes not.
For numpy, a vector is a single block of memory in which there are
elements of uniform type spaced at a uniform distance. This last is
the key; it's called the "stride", and it need not be the same size as
an element (so arange(10)[::3] can be created without a copy).
A multidimensional array simply has many strides, one for each
dimension. Thus ones((10,10,10)) simply keeps track of the stride for
a row, the stride for a column, and the stride for a layer. If you
want to transpose two axes, the data is not copied, instead the
strides are simply exchanged. Under normal circumstances one need not
care what the strides are or how the cells are laid out in memory as
numpy hides that from normal users.
What about flattening an array? It should turn an array into a vector,
that is, take an array with n different strides and lengths and create
as single array with a single stride and length. The order of the
resulting elements needs to be specified; numpy normally defaults to
"C order", which means that A[3,4,5] and A[3,4,6] are adjacent in the
resulting array but A[3,4,5] and A[4,4,5] are not. (Note that this is
a logical operation; the organization of the underlying array is
irrelevant for the result.)
If you want to ensure that no copy is made, you need to ensure that
the stride between elements of the array you're flattening is always
the same. Taking a 10-by-10-by-10 array A, the spacing between
A[3,4,5] and A[3,4,6] needs to be the same as the spacing between
A[3,4,6] and A[3,4,7]. This is automatic. But the spacing also needs
to be the same as the spacing between A[3,4,9] and A[3,5,0]. This is
not automatic, and often does not occur. In such cases numpy must make
a copy to ensure that the resulting array is uniformly strided.
What cases *don't* require a copy? Well, let's look at some examples:
A = ones((10,10,10))
reshape(A,(-1,)) # No copy needed
reshape(A[:,:,:5],(-1,)) # Copy needed
reshape(A[:,:,::2],(-1,)) # No copy needed
reshape(A[:,::2,:],(-1,)) # Copy needed
reshape(A[:5,:,:],(-1,)) # No copy needed
reshape(A.transpose(),(-1,)) # Copy needed
Note that none of the reindexing operations require a copy, but some
of the reshapes do.
It turns out to be nontrivial to detect all the cases where a copy can
be avoided while reshaping, and IIRC numpy misses some (old versions
of numpy almost always copied). But a freshly-created array is
normally guaranteed to be reshapable without a copy.
If you want to try reshaping an array without a copy, you can try
assigning to .shape:
In [3]: A = ones((10,10,10))[:,:5,:]
In [4]: A.shape = (-1,)
---------------------------------------------------------------------------
Thank you, but all your examples deal with 3-dimensional arrays. and I still misunderstood, is it possible somehow for 2-dimensional arrays or no? D. Anne Archibald wrote:
On 01/06/07, dmitrey
wrote: y = x.flatten(1)
turn array into vector (note that this forces a copy)
Is there any way to do the trick wthout copying? What are the problems here? Just other way of array elements indexing...
It is sometimes possible to flatten an array without copying and sometimes not.
For numpy, a vector is a single block of memory in which there are elements of uniform type spaced at a uniform distance. This last is the key; it's called the "stride", and it need not be the same size as an element (so arange(10)[::3] can be created without a copy).
A multidimensional array simply has many strides, one for each dimension. Thus ones((10,10,10)) simply keeps track of the stride for a row, the stride for a column, and the stride for a layer. If you want to transpose two axes, the data is not copied, instead the strides are simply exchanged. Under normal circumstances one need not care what the strides are or how the cells are laid out in memory as numpy hides that from normal users.
What about flattening an array? It should turn an array into a vector, that is, take an array with n different strides and lengths and create as single array with a single stride and length. The order of the resulting elements needs to be specified; numpy normally defaults to "C order", which means that A[3,4,5] and A[3,4,6] are adjacent in the resulting array but A[3,4,5] and A[4,4,5] are not. (Note that this is a logical operation; the organization of the underlying array is irrelevant for the result.)
If you want to ensure that no copy is made, you need to ensure that the stride between elements of the array you're flattening is always the same. Taking a 10-by-10-by-10 array A, the spacing between A[3,4,5] and A[3,4,6] needs to be the same as the spacing between A[3,4,6] and A[3,4,7]. This is automatic. But the spacing also needs to be the same as the spacing between A[3,4,9] and A[3,5,0]. This is not automatic, and often does not occur. In such cases numpy must make a copy to ensure that the resulting array is uniformly strided.
What cases *don't* require a copy? Well, let's look at some examples:
A = ones((10,10,10)) reshape(A,(-1,)) # No copy needed reshape(A[:,:,:5],(-1,)) # Copy needed reshape(A[:,:,::2],(-1,)) # No copy needed reshape(A[:,::2,:],(-1,)) # Copy needed reshape(A[:5,:,:],(-1,)) # No copy needed reshape(A.transpose(),(-1,)) # Copy needed
Note that none of the reindexing operations require a copy, but some of the reshapes do.
It turns out to be nontrivial to detect all the cases where a copy can be avoided while reshaping, and IIRC numpy misses some (old versions of numpy almost always copied). But a freshly-created array is normally guaranteed to be reshapable without a copy.
If you want to try reshaping an array without a copy, you can try assigning to .shape: In [3]: A = ones((10,10,10))[:,:5,:]
In [4]: A.shape = (-1,) ---------------------------------------------------------------------------
Traceback (most recent call last) /home/peridot/physics-projects/pulsed-flux/writings/<ipython console> in <module>()
: incompatible shape for a non-contiguous array and In [7]: A = ones((10,10,10))[:5,:,:]
In [8]: A.shape = (-1,)
Anne _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
On 6/5/07, dmitrey
Thank you, but all your examples deal with 3-dimensional arrays. and I still misunderstood, is it possible somehow for 2-dimensional arrays or no? D.
There is nothing special about the number of dimensions, all arrays have the same methods.. <snip> Chuck
On 05/06/07, Charles R Harris
On 6/5/07, dmitrey
wrote: Thank you, but all your examples deal with 3-dimensional arrays. and I still misunderstood, is it possible somehow for 2-dimensional arrays or no? D.
There is nothing special about the number of dimensions, all arrays have the same methods..
Of course. But he was asking whether the examples I was giving, of arrays that could and couldn't be flattened, would work in 2D. There is nothing special about 3D; there are 2D matrices that can be flattened and 2D matrices that can't. Think about the matrix in terms of strides and lengths specifying how the elements are laid out in memory and things should become much clearer. I suspect the numpy book (which is not expensive) does a better job of explaining it. Anne.
dmitrey wrote:
hi all. in the numpy for matlab users I read
y = x.flatten(1)
turn array into vector (note that this forces a copy)
Is there any way to do the trick wthout copying? What are the problems here? Just other way of array elements indexing...
One important question is whether you actually need the new vector, or whether you just want a flat index into the array; if the latter, you can always [I think] use x.flat[one_d_index]. (But note that y=x.flat gives an iterator, not a new array.) Andrew
participants (4)
-
Andrew Jaffe
-
Anne Archibald
-
Charles R Harris
-
dmitrey