[Numpy-discussion] Unexpected behavior with numpy array

Damian Eads eads at soe.ucsc.edu
Sun Feb 3 20:27:13 EST 2008


Thanks Anne for your very informative response.

Anne Archibald wrote:
> On 03/02/2008, Damian Eads <eads at soe.ucsc.edu> wrote:
>> Good day,
>>
>> Reversing a 1-dimensional array in numpy is simple,
>>
>>     A = A[:,:,-1]     .
>>
>> However A is a new array referring to the old one and is no longer
>> contiguous.
>>
>> While trying to reverse an array in place and keep it contiguous, I
>> encountered some weird behavior. The reason for keeping it contiguous is
>> the array must be passed to an old C function I have, which expects the
>> buffer to be in row major order and contiguous. I am using lots of
>> memory so I want to minimize copying and allocation of new arrays.
> 
> The short answer is that reversing an array in-place requires a
> certain amount of care, in C - you have to explicitly walk through
> swapping element i and element n-i, using a temporary variable.
> Getting numpy to exchange the elements in-place is going to be a real
> pain. I suggest trying a copying method first, and only getting
> fancier if it's too slow.

I was using the copying method on my smaller data set. When I run the 
same code on the larger data set, I get "Out of Memory" errors.

>>  >>> A=numpy.arange(0,10)
>>  >>> A
>> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>  >>> A[::-1]
>> array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
>>  >>> A[:] = A[::-1]
>>  >>> A
>> array([9, 8, 7, 6, 5, 5, 6, 7, 8, 9])
>>  >>>
>>
>> Is there any way to perform assignments of the following form
>>
>>       X[sliceI] = expression involving X
>>                   with dimensions that are
>>                   compatible with the left
>>                   hand side
>>
>> without causing the odd behavior I mentioned. If not, it might be
>> helpful to throw an exception when both the LHS and the RHS of an
>> assignment reference an array slice of the same variable?
> 
> This is, odd as it seems, intended behaviour. Sometimes it's really
> useful to use the same array on the LHS and RHS: for example
> 
> A[:-1]=A[1:]

Agreed, this way of in-place shifting is useful, and we certainly would 
not want to forbid it.

> You just need to know how the operation is done under the hood (the
> arrays are iterated over in index order). (Actually, I'm not totally
> sure this is specified - under some circumstances numpy may iterate
> over dimensions in an order based on stride and/or size; whether this
> can affect the result of an operation like the above I'll have to
> think about.) In any case, much code depends on this.
> 
> Tricks for getting the array backwards without copying... hmm. Well,
> you might be able to fill in the array in an unconventional order:
> 
> A = N.arange(n-1,-1,-1)
> A=A[::-1]
> N.sin(A,A) # or whatever

Indeed, building an index array in descending order is one way to fill 
an array in reverse order. The array I am dealing with is not generated 
from arange but is generated from another numerical routine, which is 
significantly more complicated. arange was used in the example to give a 
simple reproduction of the error.

> Now the reversal of A is a perfectly normal C array (though numpy may
> not realize this; don't trust the flags, check the strides and sizes).

Agreed -- that reversals are easy to implement in C. To disclose more: I 
need much more than reversals so I tried to see how well numpy supports 
in-place algorithms, in general. If it was well-supported, I figured I 
could save myself some time from writing a whole bunch of other C code, 
and write more succinct numpy code instead. The reversal code was just a 
small experiment.

I should have given the idea more thought -- that many in-place 
algorithms can be very non-trivial to vectorize. Thus, it is 
unreasonable to expect numpy array slicing to generally support the safe 
implementation of vectorized, in-place algorithms.

The fact that sometimes weird behavior occurs is a bit concerning but I 
guess there are always dangers that come with the flexibility offered by 
Python.

> Or you could do it explicitly in numpy:
> for i in xrange(n/2):
>     t = A[i]
>     A[i]=A[n-i]
>     A[n-i]=t

I'm trying to avoid using Python 'for' loops; there is too much data.

> In the likely case that this is too slow, you can do the copying in
> blocks, small enough that memory consumption is moderate but large
> enough that the python overhead is not too much.

One of the reasons why I chose numpy/Python was that it offers a 
succinct syntax. A block-based approach would work, but with less 
succinctness and readability than the C function.

> Finally, I realize that digging around in legacy code can be
> miserable, but it is often not really very difficult to make a C
> function handle strided data - the whole principle of numpy is that
> compiled code really just needs to know the start address, data type,
> and the spacing and length of an array along each dimension.

This concept of striding an array buffer passed from some higher level 
language is not new to numpy/Python though. There are potentially 
additional costs when more complicated non-contiguous striding is used 
like page faults and additional arithmetic for computing complex 
indexes. Adding one to an index or incrementing a buffer pointer is 
generally more readable and probably requires less computation. I agree 
that in general it's not difficult to use striding but the larger the 
legacy code base, the more changes that might be needed, and the more 
likely bugs or bottlenecks will be introduced. As the number of bugs 
increases, they become more difficult to pin down. Thus, I'm generally a 
bit cautious when considering such changes to legacy code.

Here's another question: is there any way to construct a numpy array and 
specify the buffer address where it should store its values? I ask 
because I would like to construct numpy arrays that work on buffers that 
come from mmap.

Thanks again for your comments and help!

Damian Eads

---
University of California, Santa Cruz
http://www.soe.ucsc.edu/~eads



More information about the NumPy-Discussion mailing list